Media day (week of 2023-10-30)
EDA stands for Exploratory Data Analysis
EDA is the step of “getting to know” your data
You have already been doing some EDA by sorting the data and understanding what is in each column
Another very useful tool for EDA is data visualization
Today we will learn how to visualize data using ggplot2
ggplot2
is included in the tidyverse
set of packagesplot()
. But ggplot2
has more consistent syntax.Colors don’t correspond to contents (meat is green?)
3D doesn’t have any meaning, only makes plot more complicated
Hard for humans to visually compare area
Simple
Easy to understand
Conveys a message
Pie chart
Bar graph
ggplot2
approach to plottingPie chart
Bar graph
How is population represented (“mapped”) in each plot?
What is the shape (“geometry”) of each plot?
Pie chart
Bar graph
We will continue to use the gapminder-analysis
project in the data-analysis-course
folder on your Desktop
Also create a file to write today’s code. You could call it data-viz-practice.R
tidyverse
tidyverse
package with library()
scales
package, which is for making plot labelsread_csv()
function to load a spreadsheet as a dataframeThis is a dataset of economic statistics from various countries over time, from https://gapminder.org
The meaning of some columns is obvious (country
, continent
, year
), but not others
pop
: PopulationlifeExp
: Life expectancy (寿命)gdpPercap
: GDP per capita (一人当たりの国内総生産)In the last challenge, we saw a general trend, but there could be more detail within certain groups, like continent or country
Let’s use color to show the continent
What is color in ggplot2
?
ggplot(data = INPUT-DATA, mapping = aes(MAPPING)) +
GEOMETRY
data
always comes first and mapping
second, we can omit those names and make our code a little simpler:ggplot(INPUT-DATA, aes(MAPPING)) +
GEOMETRY
Let’s try representing the data with a different geometry (shape)
This time we will use lines (geom_line()
)
Lines connect points along the x-axis
But we only want to connect points within each country
We need to add another aesthic mapping for that, the group
+
We can add additional plot layers using the +
sign
For example, lets add points on top of the lines
+
+
+
The ggtitle()
adds a title to a plot. Use ggtitle()
as another layer to add a title to the last plot.
+
Each layer modifies the plot, so you can build it gradually
Other things layers can do:
So far, we have been mapping aesthetics to variables in the data
But you can also simply assign the same value to a particular aesthetic (such as color)
Do this by setting its value outside of mapping = aes()
ggplot2
can make multiple plots at once using facetting
Each facet is a mini-plot of some portion of the dataset
vars()
Let’s try this for some countries in Asia
If you need to you can save your plot in R, and write it out as an image file
Use the ggsave()
function
.jpg
, .png
, etc.ggplot(INPUT-DATA, aes(MAPPING)) + GEOMETRY