Module 4 Assignment Getting Data

Module 4 Assignments

There was definitely a learning curve with Python on this week’s assignments. There were some parts that I retyped multiple times only to realize the mistake was farther up than I was looking. Like many of my classmates, I also had to slow down to double check what words meant and how functions operated. I was constantly cross-referencing the assignments and Googling, especially for the Gender Inference assignment. I think it was really helpful to have to go through the assignment in reverse because it forced me to slow down and take another look at terms I only half understood the first time. Also, I learned to appreciate Colab making you run each cell because if everything was constantly updating, I think I would be indefinitely lost.

Overall, I believe that Gender Inference will be the most applicable to my research in the future. Most of my personal research is centered around gender so having a program that could infer gender quickly would be a time saver in the preliminary stages of research for establishing research questions.

I would be interested in fetching data from the Museum of Menstruation and Women’s Health . The website is a digital archive of advertisements, articles, pamphlets, and even photographs of material culture that are in the physical museum. I have done research with the site before, and it is an older website so it is a little hard to navigate. It is easy to get lost in the archive with how it is organized and also, very easy to lose a source you just saw. It would be interesting to have access to metadata without having to count it by hand – like how many advertisements were published in a specific year or a specific magazine, or how many education pamphlets came out of each state during a specific year. The information is there, it is just broadly categorized and hard to visualize on the current webpage.

Links:

Tags Getting Data

2 replies on “Module 4 Assignments”

Holy moly, that menstruation site. That would be a big headache to scrape since it looks like it has no regular structure.

Like if you right click > inspect or view page source, the first few entries of the booklets page are structured in a garbage mess of h4’s and weird spacing.

It could probably be done, but because there’s no regular html div that has eg title separate from other elements, you’d essentially have to pull all the text out and then clean it by hand. Not impossible, but a headache. That’s a great source though.

Assignments look good, with one minor issue in this cell. In that cell, you’re setting the variable start_year to be the text “start_year”, instead of using json notation to tell colabs where to find the start_year information. Check how the title variable is located in that cell and how it differs from what you’re doing with start_year (this is a super common mistake and one I often do myself). Otherwise looks great, and good detailed comments on the gender inference notebook.

All fixed, thank you!

Comments are closed.