Mean:
Mean is the sum of all the elements of a set divided by the number of elements in the set.
Mean = Sum of all the set elements / Number of elements
For example, if a data set has 10 numbers starting from 1 – 10 (1,2,3,4,5,6,7,8,9,10), Mean = (1+2+3+4+5+6+7+8+9+10)/10 = 55/10 = 5.5. Thus, mean is same as the arithmetic average.
Median
The median is the middle number in a sequence of numbers. To find the median, we need to write the elements of the set in increasing order and find the number of elements then finally find the middle one which is the Median.
So, if a set consists of an odd number of elements, then the middle value is the median of the set. For example, if the elements of a data set are 1,2,3,4,5, then the median is 3.
If the set consists of an even number of elements, then the median is the average of the two middle values. For example, if the elements of the set are 1,2,3,4,5,6, then the median is (3+4)/2=3.5
Median is a useful property in case of any outliers in the dataset. An Outlier is a very huge aberration in the elements of a data set.
For example, if a set consists of elements: 1, 2, 3, 4, 5000, then the element 5000 is an outlier. If we consider the Mean of this Data Set, the value can be misleading which is (1+2+3+4+5000)/5 = 1002. But, the median is 3 which summarizes the set better.
Mode
The mode in a data set is the element that is most frequent in a dataset. If the elements of the data set are all different, then there is no Mode.
For example, if the elements of a Data Set are (1,2,3,3,4,5,6,3,8,9), then the mode is 3 which is the most frequent one.
It is possible to have two modes (bimodal), three modes (trimodal) or more modes within larger sets of numbers.
In cases of Normal Distribution, a mode can be equal to the value of mean and median. In other distributions, the value of mode may be different from the two.
Variance
Variance is to measure how far a data set is spread out. If all the elements of the Data Set are the same, then the variance value is zero. For Example, The data set 1,1,1,1,1 has a variance of zero (since the numbers are identical).
The variance for the entire population of data set can be calculated by the below steps:
-
Find the Mean of the Numbers in the Data set
-
Find out the difference between the mean value and each value in the set.
-
Square those differences.
-
Add the differences.
-
Calculate the Average
For the data set 1,2,3,4,5 , the variance of the population is calculated as 2.
-> Mean for the dataset = 3
->Differences = (1-3)=-2,
(2-3)= -1,
(3-3) = 0,
(4-3) = 1,
(5-3)= 2
-> Square of the differences = 4,1,0,1,4
-> Adding the differences = 10
-> Average of the total sum = 10/5 = 2
So, The formula to calculate Variance of a population is as below:
N= Number of Elements
x = individual elements , µ = Mean of the population
If we are calculating Variance of a Sample Size data, The formula is a little different.
In the last step, instead of dividing the SUM by a total number of elements(n), we need to divide by (n-1) number of elements i.e. (5-1) = 4 for this example.
So the value for the Variance of a Sample Data Set will be 10/4 = 2.5.
The Formula to calculate Variance of a Sample is as below:
Standard Deviation
A standard deviation is used to tell how the numbers for a group of numbers are spread out from the average (mean). A low standard deviation means that most of the numbers are very close to the average. A high standard deviation means that the numbers are spread out.
The standard deviation is calculated by square rooting the variance of the data. While variance gives us a rough idea of spread, the standard deviation is more concrete, giving the exact distances from the mean.
Since the Variance is calculated by squaring the values, it cannot be applied to real-world calculations. Standard deviation is calculated by obtaining the square root of the variance which is the same unit as the elements of the set. Hence, Standard Deviation is a more trusted statistical quantity to make proper statistical calculations.
For Example, for the above Data Set(1,2,3,4,5) the Variance of the population data is calculated as 2.
So, the Standard Deviation is calculated as Square Root(Variance)= 1.414.