Zaturrby/project-proposal-RJ

## project-proposal-RJ
# From Objects to Data - Project Proposals

Robert-Jan Korteschiel
St.nr: 10399143
E-mail: rjkorteschiel@gmail.com

## 1. Metadata -> Comparison

How much media attention does each field within the Humanities get from the New York Times? How many words do the individual articles get, and at what page are they? How can these values be used to measure the level of attention? Has this changed over time? Are the numbers representative for the size of the profession? (Perhaps measured by economics, or student counts?) How does it compare to other newspapers?

Result:
Could we state from the result that the NYT is opinionated or balanced?

Approach:
1. Define search words to distinguish each field
2. Define a system by which to value the hits appropriately
3. Plot it against time
4. Compare it with datasets from other newspapers
5. Compare it with profession economics or student counts

## 2. Images -> Photogenic Studies
Is there a relationship between the available images and the fields of study? Are some fields more 'photogenic' than others? Do they get more attention because of it? Are there notable differences in the images, are they bigger or perhaps more colorful?

Result:
Do fields that study visual material attract more attention?

Approach:
1. Define search words to distinguish each field
2. Create a system by which to value each article / image
3. Refine the system further (size, color, or other properties)

## 3. Individual Articles -> Categorization tool:

How do the individual articles correspond with the metadata? Is it possible to deduct the categorizations from the text? Could we create a tool that assigns categories automatically? Are the assigned categories appropriate, or are other structures more useful? Which category structures require a human to assign them, and which can be done automatically?

Result:
Gain insight in how meaningful (automatic) categorization is.

Approach:
Iterative, as this task will probably never be finished. The first step could be to create a tool that used the most used word in the text, safe the least informative ones and assign that as a keyword. Following steps could refine the process more and more.
	# From Objects to Data - Project Proposals

	Robert-Jan Korteschiel
	St.nr: 10399143
	E-mail: rjkorteschiel@gmail.com

	## 1. Metadata -> Comparison

	How much media attention does each field within the Humanities get from the New York Times? How many words do the individual articles get, and at what page are they? How can these values be used to measure the level of attention? Has this changed over time? Are the numbers representative for the size of the profession? (Perhaps measured by economics, or student counts?) How does it compare to other newspapers?

	Result:
	Could we state from the result that the NYT is opinionated or balanced?

	Approach:
	1. Define search words to distinguish each field
	2. Define a system by which to value the hits appropriately
	3. Plot it against time
	4. Compare it with datasets from other newspapers
	5. Compare it with profession economics or student counts

	## 2. Images -> Photogenic Studies
	Is there a relationship between the available images and the fields of study? Are some fields more 'photogenic' than others? Do they get more attention because of it? Are there notable differences in the images, are they bigger or perhaps more colorful?

	Result:
	Do fields that study visual material attract more attention?

	Approach:
	1. Define search words to distinguish each field
	2. Create a system by which to value each article / image
	3. Refine the system further (size, color, or other properties)

	## 3. Individual Articles -> Categorization tool:

	How do the individual articles correspond with the metadata? Is it possible to deduct the categorizations from the text? Could we create a tool that assigns categories automatically? Are the assigned categories appropriate, or are other structures more useful? Which category structures require a human to assign them, and which can be done automatically?

	Result:
	Gain insight in how meaningful (automatic) categorization is.

	Approach:
	Iterative, as this task will probably never be finished. The first step could be to create a tool that used the most used word in the text, safe the least informative ones and assign that as a keyword. Following steps could refine the process more and more.
No results found