Module 4 email – 2021 Practicum in Digital History

This week we have no discussion starter video and no reading because we have 4 longer assignments and a post on the course site. Highlights for this week include the corniest shit I have ever said on video, because this week will probably be hard because it involves a lot of new concepts and understanding some logic that is probably unfamiliar.

The longest assignments for this week will probably be the first two, the API and webscraping assignments. I strongly recommend doing the assignments in the order listed because the concepts build on one another. Remember that you already have all the skills you need to do these assignments–this week we’re just putting into practice things like loops and requests, and our webscraping assignment in particular will build on our first HTML and CSS assignment. If you get stuck, look at both the cell you’re working on and the cells above it, because sometimes editing a url won’t immediately show a problem but will cause issues later on. If you’re having trouble articulating what you’re stuck on, try writing out comments in plain English, breaking down the steps of the program without detail, or inserting new code cells to try different things out before proceeding. Think about what kind of objects your variables are (is it a string? A number? A file response? An array?) and see if you can find an example of what you’re trying to do elsewhere in the assignments. Use print(thing) to see what kind of result you’re working with and use your close reading skills as a historian and from our data critique to break the problem down.

Nothing you can do with these assignments will break anything, so play around. If you get irreversibly stuck, you can use colab’s revision history function to revert to an earlier version of your assignment. Note also that you can just completely start from scratch if you go back to my original version of the assignment and fork it again.

This week’s assignments are mostly focused on getting and extending data (hence the module name), but the ability to work with an API can be really powerful. Increasingly content management systems for libraries, archives, and museums ship with an API built in to access collections data. Omeka, eMuseum, and many other a content management systems for libraries, archives, and museums, have APIs as part of their basic framework, so that no additional work by museum and archives staff is needed to make collections accessible. (PastPerfect, one of the most common collections management packages, lacks an API). Many museum CMS APIs also include the ability to programmatically and bulk update the public-facing website with new objects–for example, you could create a spreadsheet with all new acquisitions for the month and write a program to bulk update the online catalog, without having to have one person individually update each metadata field for each new object. We won’t be doing that in this class, but after this week you’ll have all the skills to do that. The Getty Museum has a good discussion of considerations for museum CMS if you anticipate a career in museums or archives. One of my other goals, besides good troubleshooting skills, is that all of you will come out of this class with the ability to say “this task is taking forever to do by hand, I bet there’s a better tool to do this repetitive thing.”

Finally, note that for this week you’ll post your links as a post, rather than a comment, and I want you to look around the internet for a museum or archive that either has an API or structured item pages like the Library and Archives Canada example that are good for scraping. This is for two reasons: first, so you can identify in the wild places where you might apply your new skills; and second, so that you can identify possible data to use in the final project if none of my curated sets have sparked your interest. (And third, so that if it’s really cool data, I can add it as a curated dataset for future versions of this class)

If I have not already approved your use of a dataset from your own research to use in the final project, you should start thinking about which one/s of the datasets I’ve curated you’d like to use for the final project. Part of the reason we did the data critique assignment was so that I could match you all up to datasets I thought might be of interest given your research interests, and so that you all could write up the datasets in detail for each other to consult. If you’re not sure what you might want to work on, look through the data critique sheet to get an idea of what’s available and the final project requirements for some ideas of what you might do. For the final project, you don’t have to stick with one of the datasets you did for the final project. If you want to use data other than something I’ve provided, I’ll need to see it and your code for scraping or fetching it if applicable by module 6.

I am slightly behind responding to the #module-3 discussion and your assignment posts this week; I should be caught up by the end of Sunday evening. Part of the reason I’m flexible with deadlines for you all is that sometimes I also need flexibility in this goofy online semester as well, and I trust that as adults you’re all able of judging your own scheduling needs.

I will be available on Zoom during Monday office hours 2-3PM, our Wednesday afternoon scheduled meeting time, and throughout the week on Slack and email to help troubleshoot the assignments. For our assignments this week, once you’ve forked the assignment remember that you can link directly to your assignment–this is especially helpful if you’re struggling to find the language to describe the problem. Remember to use this if you need help with something and include the link to your assignment when you ask for help!