Where is history on the web and how’d it get there?
This week we’re thinking about the pieces that make up the internet, and how to put stuff on it.
This week we will:
- Think about what mass digitization means for the historical profession
- Think about what we can and can’t know as historians and why
- Learn how to “read” data
- Module Outline
- Wednesday agenda
- Discussion Starter
- Assignment: Data Critique
- Assignment: Data Cleaning
During our Wednesday meeting, we’ll troubleshoot the Data Cleaning assignment and answer questions about the Data Critique assignment listed below in Tasks.
[What is data video]
- Michel Trouillot 1-31 [PDF download]
- Lara Putnam, “The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast,” The American Historical Review, Volume 121, Issue 2, (April 2016): 377–402, https://doi.org/10.1093/ahr/121.2.377
After reading all the materials and watching the discussion starter video, respond on the #module2 Slack channel using the 3CQ method: compliment, comment, connection, question.
Compliments should emphasize something you liked about what a discussion starter said or what the group discussed. Comment should reinforce, but deepen an idea they shared. Connections should connect what the discussion starters talked about to your own unique thought or reaction, extending the discussion to new ideas, examples, or concepts. Finally, questions should open up space for class discussion. They may question something the discussion starters discussed, raise a question based on the readings, complicate an existing example or idea, or direct us to think about something in the reading the discussion starters overlooked.
Remember that I’m not requiring you to respond to a certain number of classmates, but you will get as much out of this class as you put in–talking to one another will help you deepen your own understanding!
- Read the assigned materials and respond in the #module2 Slack channel to the discussion starter video using the 3QE method above.
- Do the Data Critique assignment below
- Post your data critique to the shared google sheet. You will need to fill out a separate row for every column in your dataset!
- Do the Data Cleaning assignment below
- In the comments of this post, use pretty links < a href > to post a link to the csv and json files on your Github
You may have noticed the past two weeks that I sometimes have very specific instructions for where and how to link your homework assignments. As we progress through the semester, we’ll start doing programming tasks or interacting with programs where steps need to be done in very particular orders that seem arbitrary at the time. I want you to get used to carefully following directions so that you build the skill of being precise in your work!
Assignment: Data Critique
You will be assigned two sample data sets. It may be helpful to review last week’s data filtering lesson and the Working with Data assignment.
For this assignment, you will need to examine what information is in your dataset, what kind of events, people, or phenomena your dataset describes, and what it cannot describe. Use a spreadsheet filter to get an idea of what kind of data you’re working with. What’s the scope of your data temporally, geographically, in number of records, or in other dimensions?
What’s the “thing” that composes a row? Is a row a person, an event, an object, something else? What attributes are documented by the columns? Is there any kind of column missing that you might expect given the kind of “thing” the row documents? For example, if we have a row describing a person, it might be unusual that it doesn’t have a column for gender.
As best you’re able to determine, you should also describe how the data was generated, what the original sources were, how the data was collected, and how your data is divided. What is an individual record row? How is the data divided into columns and why? If this dataset were your only source, what kind of information would be left out?
To post your data critique, open the shared google sheet and fill out one row in the Dataset tab for each dataset, and one row per column of your datasets in the Row descriptions tab. You will need to fill out a separate row for every column in your dataset! See the example rows for how I did this assignment for the Albany Manumissions dataset.
To get the name of each column in your dataset into the shared google sheet, it may be helpful to select all the column names with cmd + shift + arrow left (Mac) / ctrl + shift + arrow left (PC) and transpose the column names into the field names column. This will save you the time of typing the field names all by hand.
This assignment is adapted from Miriam Posner’s Data Critique.
Assignment: Data Cleaning
You will need to download and install OpenRefine for this assignment. Please email or ask on Slack if you run into difficulties with the installation.