Covariance -- A Visual Walk Through

In a previous post, I’ve looked at walking through the calculation of variance and standard deviation, visualizing each step. This post is dedicated to the visualization of another statistic: covariance.

Covariance is a measure of the joint variability of two random variables.

Let’s have a look at the sample covariance equation over all:

\(cov(x,y) = \frac{\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})}{n-1}\)

And now lets apply the equation to the following case:

Ready? Okay, now let’s walk through the calculation; there are 7 small steps:

Step 1: find the mean of x:

\(\overline{x}\)

Step 2: find the mean of y

\(\overline{y}\)

Step 3: calculate difference between x and mean of x

\(x_i-\overline{x}\)

Step 4: calculate difference between y and mean of y

\(y_i-\overline{y}\)

Step 5: multiply these differences (observation-wise)

\((x_i-\overline{x})(y_i-\overline{y})\)

Step 6: Add these areas

\(\sum_1^n (x_i-\overline{x})(y_i-\overline{y})\)

Step 7: Divide through by number of observations minus 1 (the result will a bit larger in magnitude than the average)

\(cov(x,y) = \frac{\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})}{n-1}\)

That’s it.

Now we can compare this visualized result to what we would get if we simply trust the R covariance function to calculate this for us.

sum(df$rectangle)/(nrow(df)-1)
## [1] 0.4766744
cov(x,y) # Calculation for **sample** covariance
## [1] 0.4766744

Great. It’s a match!

Discussion question

What would the units of unadjusted covariance be for the covariance between life expectancy in years and per capita gdp in dollars?

Note: The normalized version of covariance is Pearson’s correlation coefficient.

References

R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Evangeline Reynolds
Visiting Teaching Assistant Professor

My research interests include international institutions, causal inference, data visualization, and computational social science and pedagogy.