Here is my data set: https://github.com/elisabethmtatum/1770sample
Cleaning– I’ve spent the last two weeks cleaning my data and beginning to extend it. My first focus was cleaning the names. In order to show more connectivity in my network, I need to rectify variant spellings, so I could track the same person across multiple transactions. I started by cleaning full names together, which helped get the most obvious different spellings. While looking for names that phonetically sounded similar, or had one letter different, I also had to consider whether names were anglicised (for instance Johannes–> John, Jacobus–> Jacob. The second time i went through, I cleaned last names and first names separately to catch names that were not picked up by the algorithms in Open Refine. When cleaning, i stayed on the side of caution, not wanting to get overzealous with clumping people together when I don’t have additional biographical data to verify.
Extending–I’ve extended my data by using the genderize API to assign gender. For this, I needed to then go through manually and make sure that some obscure male names were not recorded as female. I still have a handful of people in my sample whose gender I have labeled as unknown.
I’m still working on a code to scrape info from Find a Grave. If successful, I may add a presumed location to each creditor and debtor, based on the occurrence of last names and burial places in Dutchess County.
Visualizations– The first visualization I know I want to make is a network model. I’ve played around with my data by plugging it into the Quaker Networks D3 visualization (here’s the link if it doesn’t show below). I haven’t yet figured out how to add different formats, but I would like my final visualization to do the following:
—zoom the svg in and out to see the full network
—create a directed network with two different colors to represent whether the money is owed by or owed to the insolvent debtor
—thicken the width of lines depending on the amount owed or owing
—organize nodes (if possible?) by geographic location. Can a network model overlay a map?
—Add a sliding filter for dates. This is probably not necessary for this small sample, but I’d like to know how to do it for my data in the future.
For my second visualization, I’d like to have a representation of how much each insolvent debtor’s net worth. I’m thinking that maybe a mirrored bar chart (like a population pyramid) might be an interesting way to show this.