Skewness

Skewness is defined as asymmetry in the distribution of the sample data values. Values on one side of the distribution tend to be further from the 'middle' than values on the other side. For skewed data, the usual measures of location will give different values, for example, mode < median < mean would indicate positive (or right) skewness.

Positive (or right) skewness is more common than negative (or left) skewness.

For univariate data Y1, Y2, ..., YN, the formula for skewness is:

Where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative

Properties of skewness

Skewness can be infinite, as when

Pr [ X > x ] = x -3 for x > 1 , Pr [ X < 1 ] = 0

or undefined, as when

Pr [ X < x ] =  (1 - x) -3/ 2 for negative x

Pr [ X > x ] = (1 + x) -3/ 2 for positive x.

In this latter example, the third cumulant is undefined. One can also have distributions such as

Pr [ X > x ] = x -2 for x > 1 , Pr [ X < 1 ] = 0

where both the second and third cumulants are infinite, so the skewness is again undefined. If Y is the sum of n indepent random variables, all with the same distribution as X, then the third cumulant of Y is n times that of X and the second cumulant of Y is n times that of X

so Skew [ Y ] = Skew [ X ] / √n . This shows that the skewness of the sum is smaller, as it approaches a Gaussian distribution in accordance with the central limit theorem.

