Well done on the discussion starter videos. All groups’ videos are now up on Blackboard or embedded on the course site and they all look really good. This week we have a discussion starter on the Trouillout and Putnam pieces up on Blackboard. I suggest reading the two pieces, then watching the video and responding in the #module2 channel on Slack. I’ve given some pointers on the Module 2 page for how to structure your response since sometimes async discussion is hard to get started.
In your Data Filtering and Working With Data lessons for Module 1, you walked through the first steps of reading a dataset, which we’ll dive into more deeply this week. In Module 2, you have a data critique task and a data cleaning task. A data critique is the process of reviewing your data before you start working with it and manipulating it. (Think of it as your first reading of an archival document).
Reading data is a skill, just like reading primary documents is a skill. To help you acquire this skill, I’ve assigned everyone one or more datasets (find your assigned datasets here; you’ll need to go to the datasets page to actually download your assigned datasets.) Everyone has roughly the same amount of work, since some datasets are longer or more complex than others. I’ve assigned you each different kinds of datasets related to your research interests, because even if you plan to use your own data for the final project, it’s helpful to see how other projects have structured their data. You can also combine and compare datasets for the final project, so this is also a first look at some of your options for final project topics.
The Module 2 page walks you through the steps and requirements of the data critique, but to start off, I recommend downloading one dataset to start with and opening it in Excel/Numbers/GoogleSheets. If you have more than one dataset assigned, the later ones will go more quickly after you have a sense of what you’re looking for. Remember to use some of the navigation hotkeys I mentioned in Module 1 to make your life easier (these may have different keys if you use something besides Google Sheets, but there are analogs in any spreadsheet program).
Your second task with data cleaning will probably take longer than the data critiques, so allot time for it. Data cleaning is often the biggest, and most invisible, part of any DH project and one you will almost certainly spend a large chunk of time on for your final project. Data is often messy under the best circumstances. Consider the time tracking sheet I asked you to do with very vague instructions back in Module 0. I left the instructions for that intentionally vague because I don’t particularly care how you use your time, but because it helps illustrate the point I’m making now: even in a very tiny dataset about how six people spend their time, we have: multiple people entering data; multiple date formats; multiple interpretations of what “Hours” means (the hours spent or the time on the clock?); multiple spellings and phrasings for the same general thing (like the HTML assignment). Even the cleanest of modern data will be like this, and historical data is often much worse. Your first scholarly decisions about how to handle your data start with cleaning.
I will be available on Zoom during Monday office hours, our Wednesday afternoon scheduled meeting time, and throughout the week on Slack and email to help troubleshoot the assignments. You guys did a great job last week helping each other troubleshoot and identifying common issues. OpenRefine needs to be downloaded and installed, so I recommend doing that early in the week in case you run into issues.
One reply on “Module 2 email”
[…] my email for this week if you missed […]