I am using datasets provided by Prof. Kane that list economic transactions between Dutch fur traders and their indigenous trading partners that conducted business in Albany, NY. Most of these exchanges involve furs of various mammals, including bears, martens, muskrats, and most notably, beaver. These datasets represent primary documents written by Jelles Fonda, Evert Wendell, and an unknown Dutch trader who operated in the Hudson Valley. These account books record more than just fur transactions, as all three traders also included comments on the appearance of their partners, their kin, where they live, how much debt they owe, and other trade goods they exchanged. There are some non-native traders included in these accounts, but not many.
I cleaned the Wendell dataset by fixing typos and combining records such as “?” and “???”. I matched up inconsistent abbreviations for the dates, like “Sep” “Sept” and “September”, all of which seemed to appear equally in this dataset. Annoying. For all four datasets, I spent a long time combining records in the “Trade Goods”, “Items” and other similar columns. I decided to just put “1 bottle of rum”, “3 casks of rum” “2 rum” all under 1 record: Rum. I decided to do this because I will be able to gain more information from a broader column like that, where all the records that just pertain to a certain amount of rum being traded can all be analyzed together as opposed to under smaller categories like “1-3 casks of rum” or “5 or more casks of rum” or something like that. I’m not interested in the quantity here, I’m interested more in the content. Same with “big knife”. I edited that to “knife” only.
For the Tribes column, I eliminated question marks. I decided to combine “Mahican” and “Mahican?” because I figured something labeled “Mahican?” is already, in my mind, associated with “Mahican” so the data is already skewed in that way. The question mark implies it is uncertain or not confirmed, but we can not be completely sure about all this data anyway since no dataset is perfect or completely accurate. The traders writing these could’ve made their own errors. This is also why I edited the year records as well, such as “1698?” to “1698”. I changed a lot in the Town column as well, matching “of the Catskill area” with “Catskill” because both imply origin. I made “Canadian” and “lives in Canada” the same thing as well for simplicity’s sake. “Mohawk who lives in Seneca country” and “Seneca lives in Cayuga” became “Mohawk who lives among Seneca” and “Seneca who lives among Cayuga”. I wanted to keep the wording as close as possible to create the cleanest of datasets. Lastly, I moved some things that were in the wrong rows. For example, I moved “escorted by the Greyhead’s wife” from tribe to affiliation where it matched other records.
I changed this data by eliminating the pages field for all datasets. There probably is something interesting there if I decided to look at whether it was in chronological order and if not, if certain trade transactions were included earlier than others and if that could reflect some sort of bias on the part of the trader, but there’s no way to know for sure if the records were organized on pages completely randomly.
Another big thing I changed was the Date column. There were some records that only included the year, some with just the year and month, and some with the year, month, and day. First, I separated everything in the column by semicolons so it looked like “Sept;18;1709” and then I split the columns based on the semicolons. So I ended up with three extra data columns. Next I eliminated the column containing the days of the month because those aren’t helpful to me and only a small portion of the records actually contained this information. Next I organized the months and years into separate columns. That took some time because they didn’t split very evenly. But OpenRefine has an “edit all matching cells” option so that when you edit the year for one row, you can apply that same edit to all matching rows. Once I figured that out, it went a lot faster.
Here are two sample visualizations I came up with. The first one is from the Wendell dataset. It shows the number of beaver furs traded each month and filtered using a year slider. Tribe is visualized by color. 1710 yields the most data here, almost creating a bell curve with July having the record number of beaver furs Wendell received that year (64 furs). This year also appears to represent the most native groups, but there are other years that include this diversity as well (1704, 1706, and 1707, to name a few).
The second visualization is from the Jelles Fonda account book held at the New York Historical Society. Here I visualized the number of beaver furs traded by the top eighteen traders that Fonda worked with. Interestingly, the top two are non-native traders, as you can see by the color differentiation. I’d like to work a bit more with this dataset and pick out more variables to visualize, such as changes in the top traders over time. Unfortunately, this dataset does not have a gender column, but I can always try to make one based on the Name column that indicates “wife of…”.