For my final project, I am going to be using the “Dickinson College Archives, Carlisle Indian School Digital Resource Center”. This dataset lays out individuals that attended the Carlisle Indian School and provides other information on them, such as age, height, what tribe they originated from, if their parents are still living, among other information. In cleaning the data, I’m going to focus on the tribal names due to the variety of spellings for the same tribes. To extend the data to make exploring my argument easier, I’m also going to add a column for regions to group the tribes that are in the same region. 

One visualization I am thinking of for this project is a map of the regions I’m separating the tribes into with the schools location also on the map, this will allow viewers to get an idea of where the tribes are located in relation to the school. Another visualization I am thinking of using for this project is some type of chart showing the percentages of students that return home from each region. With that chart I will then be able to confirm or deny my argument of whether a student is more likely to stay at the school longer if they’re from a region further away, due to the possible difficulty of returning home.

Good. The cleaning is probably going to be a significant chunk of your project time, so sink a good amount of time into that to make your life easier down the road and stay focused on your initial question. This dataset is big and messy, so it will be easy to get side tracked into lots of different issues. I’d rather see you finish the semester with two really strong, clear visualizations rather than ten messy, unfocused ones, and spending a good chunk of your time cleaning will help you do that. Once you’re mostly satisfied with your cleaning, post the cleaned/edited spreadsheet to github or googlesheets so I can eyeball it and see what you’re working with.

