7.3 Dependence Measures

Now that we’ve covered univariate measures of central tendency and dispersion, we need to talk about bivariate measures. Univariate measures are measures with respect to a single variable. Bivariate measures are measures with respect to the relationship between two variables. To quantify such relationships, we use dependence measures, which measure the degree of how one variable value depends (or is related) to another variable value.

There is a distinction between a dependence measure and a dependence relationship. The first is simply a bivariate measure of how one variable is related to the other. On the other hand, a dependence relationship implies a deeper connection, where not only are they related but information regarding one variable gives us information about the other one.

There are two main measures of dependence, and they are practically the same since one is the other constrained to the unit range between -1 and 1. Covariance is a measure of the joint variability of two variables. It is the expectation of one variable minus its expectation times another variable minus its expectation:

\[ \operatorname{cov}(x, y) = \operatorname{E} \left[ \left( x - \operatorname{E}(x) \right) \cdot \left( y - \operatorname{E}(y) \right) \right]. \qquad(7)\]

High positive or negative covariance indicates either a strong positive or negative joint variability, respectively. It measures how much one unit increase/decrease in one variable is related to one unit increase/decrease in another variable.

The interpretation of a given covariance demands knowledge about units of measurement from both underlying variables. So, if we want to analyze covariance between two variables, we must have some understanding of how the variables’ units are measured. That is why most of the time we use the correlation, which is normalized covariance. The correlation is dimensionless since it is constrained to the unit range between -1 and 1. Mathematically speaking, the correlation is the covariance divided by the product of the variables’ standard deviations, and is denoted by the Greek letter \(\rho\):

\[ \rho(x, y) = \frac{\operatorname{cov}(x, y)}{\sigma_x \cdot \sigma_y}, \qquad(8)\]

where \(\sigma_x\) and \(\sigma_y\) are the standard deviations of \(x\) and \(y\), respectively.

In Figure 55, we can see some correlations and their underlying scatter plot for 50 random generated observations.

Figure 55: Correlation for 50 Random Bivariate Observations.

As we can see, the correlation depicts the linear association between variables. The slope of the dashed line shows the correlation between variables. The further from zero and closer to \(\pm\) 1, the stronger is the association between variables. Finally, the sign of the correlation denotes the type of relationship between variables. Positive correlations implies positive relationship, increase in one variable results in an increase in the other one. Negative correlations implies negative/inverse relationship, increase in one variable results in a decrease in the other one.

DRAFT - CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso