X Tutup
Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

  • title: algorithms
  • date: 7/14/14-9/3/14
  • time: M & W 10am - 1pm
  • affiliation: Columbia University, Lede Program
  • instructors: Jonathan Soma, Chris Wiggins
  • location: 607c Pulitzer Hall *

Multiliteracies in algorithms: functional literacy, critical literacy, and rhetorical literacy. Within critical literacy, a strong emphasis will be knowing what is possible. For algorithms, this usually means computational complexity -- the study of how the time needed to perform an algorithm grows as the problem size (e.g., the number of data) grows. For algorithms dealing with data, we will study how this leads to a balance between fast and accurate. Within functional literacy, we will be building on Python's tools for learning from data, including scikit-learn. Rhetorical literacy will be the anchor for the class, as our primary interest is in producing technology-enabled journalism.


"every piece of digital technology embeds within it a model of the world, and acts as an argument for that model." --mark hansen


Schedule and notes:

Week 1: Intro to Algorithms

  • What is an algorithm?
    • Algorithms in computer science (searching, sorting, clustering)
    • Algorithms in real life
  • Algorithmic thinking
    • Step after step
    • Reductions/Black boxes
  • Multiliteracies
    • Functional literacy
    • Rhetorical literacy
    • Critical literacy
  • Summary of projects
    • Documentation
    • Agile vs Waterfall
  • Analysis of algorithm
    • Computationally (Functionally)
      • Correctness, Termination, Time, Space
      • Generality
    • Critically (Nick Diakopoulos)
      • Prioritization
      • Classification
      • Association
      • Filtering
  • Examples of algorithms in journalism
    • QuakeBot
    • Narrative Science/Automated Insights
    • Projects from last class

Day One Links

Wednesday 7/16

  • Introduction to first in-class project: building a democrat detector

Course tools: scikit-learn, pandas, ntlk, capitolwords.org's api (you will need to register for a key)

-Week Inspiration: Diakopolous Report

Week 2: Supervised learning

Focus: modeling: predictive and interpretable

-Week Inspiration: Nifty project on authorship detection

Monday 7/21

overview/concepts:

  • algorithms that learn from data to model the world ( i.e., machine learning)
  • the role of optimization in those algos
  • representation (e.g., documents)
  • examples: reading aloud the authorship nifty assignment
  • another example: bag of words

math:

  • introduce naive Bayes
  • introduce probability and Bayes rule
  • go through naive Bayes
  • show how it's a graphical model (pictures, organizing stories in your head, a chance to talk about complexity)

extensions:

  • say but don't show how you could do this with priors and for multiclass
  • talk about other classification algorithms
  • how do decide what algorithm or priors are "best"?
  • digression on meaning of modeling and desiderata of models

Fun data to play with

-Week Inspiration: what is Bayes theorem

Wednesday 7/23

  • k-nearest neighbors (predicting from examples)

Week 3: Probability and statistics

Monday: 7/28

Possibly useful: Bayes Rule

Wednesday 7/30

  • supervised learning/classification with probability modeling

Week 4: Unsupervised learning

Focus: Exploratory data analysis, iterative algorithms (and therefore fast-vs-accurate)

Monday 8/4

opening questions:

new matters:

thoughts on UNIX and algorithms in your life:

-Week Inspiration: Krugman busts out probability

Wednesday 8/6

Week 5: Nifty projects:

Monday 8/11

Wednesday 8/13

(note: lots of room for critical literacy here)

Week 6: Algorithmic story generation

Monday 8/18

  • Input, Output, Precision, Determinism, Finiteness, Correctness, Generality
  • Prioritization, Classification, Association, Filtering

Quakebot: on Source, on Slate

Storytelling

What is a story? What's in a story?

Cinderella tales, examples: 1, 2, 3

NYT: Mike Brown's autopsy, PWC fined, Germany + the American Old West, Palin and Oil, Iraq retakes dam

What's your angle? Trands, correlations, inflection points

Propublica's Opportunity Gap

Writeup: How To Edit 52,000 Stories at Once

Wednesday 8/20

For reference

Our notes

Week 7: Networks and graphs

Monday 8/25

things we'll use today:

deep thoughts/tangents:

  • data journalism is not qualitatively different from other journalism. they're both awesome because they involve thinking clearly. they're both limited by subjective choices, including design choices and process choices.
  • great quote related to the above, from a post by a stats grad student about a MOOC on data driven journalism.

I loved some of the language that came up, such as "backgrounding the data" -- analogous to checking out your sources to see how much you can trust them -- or "interrogating the data," including coming prepared to the "data interview" to ask thorough, thoughtful questions. I'd love to see a Statistics 101 course taught from this perspective. Statisticians do these things all the time, but our terminology and approach seem alien and confusing the first few times you see them. "Thinking like a journalist" and "thinking like a statistician" are not all that different, and the former might be a much more approachable path to the latter.

Possibly useful

Wednesday 8/27

  • Graphs

Week 8: Final project demo

Monday 9/1

  • No class! (Labor day)

Wednesday 9/3

  • Demos

additional resources

scikit-learn

a book

some fun data, none of which has API

Readings

Python

Data sets

X Tutup