Visualizing Data

Data Visualizations

Examples of data visualizations include barplots, box plots, scatter plots, histograms, and density plots. Data visualizations should:

  • Show the raw data
  • Not conceal information
  • Check the pattern of distribution of the data
  • Take care when the samples are small
  • Present appropriate measures of precision or dispersion

Drummond, G. B., & Vowler, S. L. (2011). Show the data, don't conceal them. British journal of pharmacology, 163(2), 208–210. https://doi.org/10.1111/j.1476-5381.2011.01251.x

Dynamite Plots

Conceal Data

A very common representation in the scientific literature, dynamite plots represent the mean as a bar extending from the axis with a line to represent either the standard error or standard deviation.

Dynamite plots:

  • Conceal the amount of data being represented,
  • Conceal the distribution of the data,
  • Obscure meaningful differences by always extending to the axis, and
  • Provide no additional information to a table.
bar plots for men and women showing the mean and standard error for self-efficacy in four activities: non-school activities, STEM courses other than physics, non-STEM coures, and physics courses. Gender differences are largest in the physics courses where women experienced much lower self-efficacy than men or than they experienced in other activities.
A dynamite plot from Nissen & Shemwell, PhysRevPER, 2016.

Our Solution

Combined Plots

Scatterplots support readers in evaluating the statistical tests used, whether the data met assumptions for those statistical tests, and to critically think about the authors interpretations of those data.

combined scatter plot, density plot and box plot
Combined plots.

Scatter Plots

Show the Data

Scatterplots support readers in evaluating the statistical tests used, whether the data met assumptions for those statistical tests, and to critically think about the authors interpretations of those data.

An image comparing a bar plot to four scatterplots of data with very different distributions that produce the same bar plots.
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4): e1002128. doi:10.1371/journal. pbio.1002128

Scatterplots show each point of data. The points are shifted left and right small amounts (jittered) to minimize the overlap in the points so that all of them can be seen.

Scatterplots:

  • show how much data there is,
  • show how the data was distributed,
  • can become overwhelmed when large amounts of data are presented, and
  • can be difficult to estimate the means or medians when large amounts of data are presented.

Violin Plots

Show the Distribution

Violin plots present two reflected density plots in a way that overlaps well with the box plots and scatter plots. Violin plots allow comparing the distributions of two data sets with very different sample sizes.

Figure showing educational debts owed by society to students from marginalized groups and how courses can eliminate, maintain, or increase those inequities.
Violin plots overlayed on scatter plots showing how instruction can eliminate, maintain, or increase society's educational debts.

A violin plot presents the probability of finding a data point. The whole area of the violin plot adds up to 2. Two because it is two density plots. The proportional area over a range represents the probability of finding a point in that range.

Before modern computers, scientists would draw their violin plot on thick paper. They would cut out the violin plot and weigh it. Then they would cut out the range of the violin plot they were interested in and they would weigh that. The ranges weight divided by the total weight is the probability of finding a point in that range.

Violin plots:

  • show the distribution of the data independently from how much data there is,
  • do not indicate how much data there is, and
  • can be difficult to estimate the means or medians when the data is not symmetric.

Box plots with notches

Show the Descriptives

Box plots show the median, inter quartile range, and outliers. We use notched boxplots that show the 95% confidence interval of the median. If the notches of two boxplots do not overlap then the difference is larger than the uncertainty in the measures.

box plots with notches

Box plots:

  • Indicate the summary statistics (i.e., the median and interquartile range),
  • reveal outliers,
  • reveal skew,
  • hide some types of data distributions (e.g., bimodal),
  • provide a precise measure of the uncertainty (i.e., the notches), and
  • do not reveal how much data they describe.

Combining all three

Maximize Information

Scatter plots show the data.

Violin plots show how the data was distributed.

Box plots show the descriptive statistics for the data.