Skip to content

Instantly share code, notes, and snippets.

@Zaturrby
Last active August 29, 2015 14:06
Show Gist options
  • Select an option

  • Save Zaturrby/42d68f874f5707799065 to your computer and use it in GitHub Desktop.

Select an option

Save Zaturrby/42d68f874f5707799065 to your computer and use it in GitHub Desktop.
# From Objects to Data - Project Proposals
Robert-Jan Korteschiel
St.nr: 10399143
E-mail: rjkorteschiel@gmail.com
## 1. Metadata -> Comparison
How much media attention does each field within the Humanities get from the New York Times? How many words do the individual articles get, and at what page are they? How can these values be used to measure the level of attention? Has this changed over time? Are the numbers representative for the size of the profession? (Perhaps measured by economics, or student counts?) How does it compare to other newspapers?
Result:
Could we state from the result that the NYT is opinionated or balanced?
Approach:
1. Define search words to distinguish each field
2. Define a system by which to value the hits appropriately
3. Plot it against time
4. Compare it with datasets from other newspapers
5. Compare it with profession economics or student counts
## 2. Images -> Photogenic Studies
Is there a relationship between the available images and the fields of study? Are some fields more 'photogenic' than others? Do they get more attention because of it? Are there notable differences in the images, are they bigger or perhaps more colorful?
Result:
Do fields that study visual material attract more attention?
Approach:
1. Define search words to distinguish each field
2. Create a system by which to value each article / image
3. Refine the system further (size, color, or other properties)
## 3. Individual Articles -> Categorization tool:
How do the individual articles correspond with the metadata? Is it possible to deduct the categorizations from the text? Could we create a tool that assigns categories automatically? Are the assigned categories appropriate, or are other structures more useful? Which category structures require a human to assign them, and which can be done automatically?
Result:
Gain insight in how meaningful (automatic) categorization is.
Approach:
Iterative, as this task will probably never be finished. The first step could be to create a tool that used the most used word in the text, safe the least informative ones and assign that as a keyword. Following steps could refine the process more and more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment