So now what?
We can read data, fetch data, and extend data — so what do we do with it? In this module we’ll start working with data visualization and start thinking about what we can do with it.
This week we’ll:
- Understand the basics of reading different kinds of data visualization
- Learn the different modes of conveying information in data visualization
- Get introduced to Tableau Public visualization software
- Module Outline
- Wednesday Agenda
- Final Project
- Assignment: Intro to Tableau
At our Wednesday meeting, we will troubleshoot the Intro to Tableau assignment.
- Comment below with your tentative data plans
- Make a free account at Tableau Public. You will need this username/password to complete many assignments. Tableau Public is different than Tableau.com!
- Download and install Tableau Public on your computer (this is free, and separate from making an account)
- Do the Intro to Tableau assignment
This looks like a lot of reading this week, but each individual chapter is fairly short!
Basic vocabulary and orientation to reading different kinds of charts:
- Fundamentals of Data Visualization Ch 3.1, Cartesian Coordinates
- Disregard the rest of the chapter after 3.1
- Fundamentals of Data Visualization Ch 4, Color Scales
Understanding and reading different kinds of charts
- Fundamentals of Data Visualization Ch 6, Visualizing Amounts
- Fundamentals of Data Visualization Ch 7.1, Visualizing a Single Distribution
- Read only the section on single histograms. We will not be dealing with or making the other kinds of distribution charts in this chapter.
- Fundamentals of Data Visualization Ch 10, Visualizing Proportions
- Fundamentals of Data Visualization Ch 11, Visualizing Nested Proportions
- Disregard the final section on parallel sets, which we will not be making or using
- Fundamentals of Data Visualization Ch 12 Intro and 12.1, Visualizing Associations
- Disregard all sections of this chapter after 12.1
- Fundamentals of Data Visualization Ch 13, Visualizing Time Series
- Disregard the final section on “Time series of two or more variables”
- Data Visualization and the Modern Imagination read through all 6 exhibits
It’s time to start thinking about the final project already! Look through our data critique sheet for data of interest to you, and remember that you can use more than one data set if it makes sense.
In the comments to this post, tentatively identify the dataset you’ll be using for the final project. If you’re interested in one of the datasets I’ve provided, comment below with the name of it. If you’re thinking about using data you already have or plan to acquire, comment below with a link to the csv of it on your github (if you’ve already got it) or a link to a sample record or the API documentation if you plan on downloading it. More than one person can work on the same data set, so don’t worry if someone’s already commented with the name of the one you’re thinking of. You can change your mind about what data you want to use later, but I want an idea of your interests now so I can make suggestions to make your life easier down the road.
All projects that don’t use data I’ve provided require my explicit approval before moving forward. This is to save you headaches down the road. I will email you individually to let you know if the project is approved or if there are issues with your plan. Remember that I will not approve projects that require transcription. If you want to use your own data, I will not approve the project unless I can look at the data.
Assignment: Intro to Tableau
It may be helpful to view some of the Tableau intro videos before starting this assignment 1: Overview, 2: Connection to Excel and Text Files, 10: Joins and Unions, 11: Creating Your First Chart, 12: Using the Show Me Tool Bar, 13: Understanding the Logic of Charts, 14: Combining Sheets on a Dashboard, 15: Adding Interactivity, and 20: Publishing and Embedding will be most useful for our class. I strongly recommend Overview, Creating Your First Chart, and Publishing.
For the Intro to Tableau assignment, download this file of Albany census data from 1850-1940. Create a new Tableau file and connect the census data. Make a new sheet for each of the chart types below, and make sure to rename the sheet with the corresponding chart name. If you get stuck, you can view or download my examples here.
Save your work frequently. When you save, your work will be uploaded to your Tableau Public account. Make liberal use of the undo button/Cmd-Z if you mess something up.
- Make a simple bar chart by dragging Birthplace to Columns and Albany-Census to Rows.
- Sort the chart using the buttons in the toolbar or the icon that appears on hover over the y axis label.
- Remove New York from the chart by clicking New York > Exclude
- Create Birthplace groups by clicking Massachusetts, holding down Cmd (Mac) or Ctrl (PC), and clicking another US state. While hovering over one of the selected states, click the paperclip icon to group the states together.
- Group all the US states besides NY together, and use your judgement to group other countries in Birthplace. Leave Ireland, Germany, and Italy ungrouped. When you finish, you should have bars for Ireland, Germany, Italy, one for all US states, and six or fewer other bars. Use your judgement to choose how and what other countries to group.
- Resort the chart after you’re finished making your groups
Side-by-side Bar chart
- Create a bar chart with the grouped Birthplace dimension you created for the bar chart, and Albany-Census in Rows.
- Exclude NY
- Drag the grouped Birthplace dimension to Marks > Color to color your bars
- Drag Year into Columns in front of Birthplace to separate charts by year
- Sort the chart
- Create a side-by-side bar chart using the steps above
- Under Marks, select circle
- Swap rows and columns using the icon in the toolbar to create a horizontal chart
- Drag Year to Columns and the grouped Birthplace dimension to Rows.
- Drag Albany-Census to Marks > Color
- Exclude NY and Null Birthplace
- Sort by immigration in 1850 so that the largest group in 1850 is at the top of the chart
- Adjust the size of the color blocks by hovering over the divide between country names or years and dragging the small arrow bar that appears.
- Drag Year to Columns and Birthplace (not the grouped one!) to Rows.
- Drag the grouped Birthplace dimension onto the Color card.
- Hovering over Birthplace in Rows, click the options triangle and change Birthplace to Measure > Count
- Exclude NY and Null
- Under the Marks dropdown menu, select Line
- In the top toolbar where the dropdown menu says “Standard,” select “Fit to Width”
- Drag Age to Rows and change it to Measure > Count
- Using the “Show Me” menu, select Histogram to create Age bins
- Drag Age from the Measures list to the Filter pane. Select “All Values” and only include ages 0-100 (values over 100 are probably an error!)
- Filter Age as in the Histogram
- Right click Gender > Create > Calculated Field
- Name the new calculated field Female, and in the box below write the following formula:
- IF [Gender] = “Female”
- THEN 1
- Make another calculated field for male
- We’re making these fields so that we can count women of a certain age separately from men of a certain age. A filter applies to the entire sheet, so if we want to visualize categories within our data separately on the same sheet, we need to mark the data we’re using. Our calculated fields simply check if a row is male or female and mark 1 if it’s the gender we’re looking for. In the next step we use those numbers to count up how many women and how many men are in each age bin.
- Drag Female and Male to Columns and Age (bin) to rows
- On the X axis of the left histogram, right click > Edit Axis and select “Reversed axis”
- For further information, see https://help.tableau.com/current/pro/desktop/en-us/population_pyramid.htm
- Filter Age as above
- Drag GQ Detail (Group Quarters Detail) to Columns and Age to Rows
- Right click Year > Show Filter
- In the Year filter that appears (usually on the right side of the chart), hover over the options triangle and select Single Value Slider
- Drag the Year filter to the top or bottom of the chart
- For further information, see https://help.tableau.com/current/pro/desktop/en-us/buildexamples_boxplot.htm
- Drag Birthplace (grouped) to color and Albany-Census to size
- In the Marks dropdown, select Pie
- Drag Birhplace (grouped) to Label
- Use the top toolbar dropdown to fit to Entire View
- Drag Year and Albany-Census to Columns and Rows
- Drag Birthplace (grouped) to Color
- Exclude NY
Proportional Stacked Bar
- Duplicate the Stacked Bar chart by right clicking the sheet name and select Duplicate
- Right click Albany-Census in the Rows shelf and select Quick Table Calculation > Percent of Total
- Right click Albany-Census again and select Compute Using > Table Down. This calculates the percentage of the total for that year, not the percent of total from all years.
- Drag Albany-Census to Marks > Size
- Drag Year and Birthplace grouped to Label
- Drag Occupation to Color and in the popup that appears, select “Filter and then add”
- Select Top > By Field > Top 10 of Albany-Census count
- Order matters! Tree maps in Tableau are sorted by the order of fields listed in the Marks pane. Drag the fields up and down the list to reorder the sort, and think about what it means to sort by one dimension before another.
- Drag Birthplace to Columns and in the “Show Me” menu select Maps (not symbol maps!)
- Drag the new Longitude measure that has been created up to Columns
- In the Marks pane, there’s now accordion sections labeled All, Longitude, and Longitude
- Duplicate the Birthplace dimension. Tableau can only understand one “level” of geographic divisions (eg, if a dimension is interpreted as a state, it can’t be used to display country information even if it has state information in it). If we want to display the state and country information contained in a column, we have to duplicate it.
- To tell Tableau to read the copied column as state, right click and select Geographic Role > State
- Expand the first Longitude section under Marks. Remove Birthplace from the pane and drag Birthplace (copy) to Detail
- To layer these maps on top of each other, right click the right most Longitude pill in the Columns shelf. Select Dual Axis to layer the maps.
- Select All in the Marks pane and drag Albany-Census to Color
- Exclude NY
- Add a year filter with a single value slider
On your own
On your own, create three additional visualizations. Only one of these can be a bar chart of any kind. All should be filtered in some way. At least one must use a visible filter like the Show Year slider we used above. All three must be a type of visualization we made above, not any of the other types of visualization available under Show Me.
In each of your three new visualizations, point out something of interest using an annotation.
Combine your three visualizations in a story or dashboard. Save your workbook, and from the uploaded version of the workbook available in your Tableau Public profile online, select the three dots icon in the lower right. Grab the embed code. On the course website, create a new post and embed your workbook in the post (if you don’t see an image of your workbook immediately after pasting, it means you didn’t grab the embed code). Before posting, make sure to uncheck the Uncategorized category, and check the Tableau assignment category.
In the text below your embed, describe your three visualizations. You don’t need to analyze them or interpret them, but you should be able to narrate them. For example, if I were describing the visualization below, I would write:
In my chloropleth map, I mapped all birthplaces of Albany residents except New York. This map is colored by the number of people born in each location. I included a slider to filter by year.
As you create and narrate each visualization, refer back to our reading for this week for the language to describe what you’re doing, and how to think about relationships between what you’re comparing.