Module 4 Assignment Getting Data

Module 4 Assignments

Being familiar with other programming languages is of course helpful, but sometimes it proves to be a hinderance when your brain keeps jumbling languages together. I use SAS and R for data analysis at work, and R is just similar enough to Python that I kept catching myself subconsciously switching to using R syntax as I was going through the assignments. That aside, my biggest frustration was that I’ve never used a language where indentation matters! I’m used to indenting code for aesthetics and readability, but kept missing proper indentations in my code today which breaks things in Python.

I think Geocoding will be a helpful tool in my research in the future, as I’m primarily interested in environmental history, which of course is very place-based. The Gender Inference tool is also very cool – I’ve seen a similar mechanism that would predict race/ethnicity based on name, which is perhaps a little less reliable. It would be interesting to do a validation test, comparing the the Gender Inference API’s name-based results to gender in a dataset where it was collected and recorded.

This is perhaps a low-hanging fruit, but I’m fascinated by the idea and methodology of pulling data from social media. Modern communication is so digital, and I think that Twitter especially is a super rich and obviously huge source of data that can be used for all kinds of historical and social inquiry. It provides a useful means to identify trends in what people are talking about, what words they are using, and so on — I think this could be particularly interesting to examine in times of disaster or unrest. I believe there is a Twitter API that researchers can use to access tweet data — I remember once hearing about a project where researchers at the CDC used Twitter posts and Google search data to identify drug overdoses before they happened, and similar methods have been used to look at potential flu outbreaks. Those aren’t historical projects, of course, but still interesting, and the methodology could be adapted to look at historical questions — we could, for example, look at areas where flu outbreaks happened in a given year and explore what people in those areas were frequently Tweeting about at the time. This has been a rambling paragraph, but, the point is: social media is ubiquitous and being able to access and analyze that data could enrich all sorts of projects.

One reply on “Module 4 Assignments”

Twitter does have a research API, but it’s a lot more of a pain in the ass these days to get an API key than it used to be. It’s also got some limitations as far as coverage–there’s lower percentage of the population in rural areas, for example, who even have a twitter account, so you’d have to somehow figure out the population coverage for the area you’re looking at. Not impossible, but annoying.

If you think you’d be doing more of the gender enrichment, genderize is good for modern names and ok for historical, but there’s a very robust R package that’s very strong for 19th century US names.

Assignments look good.

Comments are closed.