Categories
Module 4 Assignment Getting Data

Module 4 Assignments and Reflection

AssignmentsAPI, Web Scraping, Georeferencing, & Gender Inference.

I found the assignments for this week quite interesting, if a bit overwhelming and confusing at times. For the most part, I was able to trial and error my through most of the hiccups I had, though this was less applicable in the Gender Inference assignment. I found this one particularly challenging because it was far more difficult than I first expected to reorient how we approach programming, i.e. explaining line by line a pre-written code. To respond to prompts with code is one thing, but to do the inverse is quite another.. That said, I found this exercise perhaps the most stimulating, as it really made me think about the logical foundations for every line of code.

For my own research purposes, however, I think the most applicable concept explored this week is that of web scraping. While I can certainly see the value in georeferencing and the most elementary API fetching, I think the ability to pgrammatically fetch data from machine-readable texts might be the most useful. I cannot say I am all that familiar with APIs in my field, but I presume there are collections of digitized records that could be scraped to build interesting datasets about military unit compositions, recruitment, performance, etc.

To that end, perhaps a good source for such web scraping can be located here. A repository of American military casualty reports during the Second World War, these records could potentially contain important data about technological lethality, casualty demographics, etc. This could be very applicable, given my interest in military technology, tactics and battlefield experience during that conflict.

2 replies on “Module 4 Assignments and Reflection”

Sent you an access request for the notebooks.

I’ve had students work on Vietnam era NARA casualty records before, and they can be tricky because there’s sometimes not a whole lot recorded besides name, location of birth, unit, and rank. Depending on what you’re looking to do with it, those records sometimes need a lot of supplementation.

It does look like NARA has its own API https://github.com/usnationalarchives/Catalog-API for catalog metadata, as well as individual datasets created during WW2 like surveys of service members or POW tracking like this one https://catalog.archives.gov/id/1263907. It looks like the information is numerically coded (ie, male is entered as 1 and female is entered as 2, etc) which would require some translation in openrefine to make life easier, but if you’re interested in using some of that for the final project let’s touch base on zoom to make a plan.

Comments are closed.