What is Median?
The median of a finite list of numbers is the "middle" number, when those numbers are listed in order from smallest to greatest.
If there is an odd number of numbers, the middle one is picked. For example, consider the list of numbers
This list contains seven numbers. The median is the fourth of them, which is 6.
If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values. For example, in the data set:
The median is the mean of the middle two numbers: this is (4 + 5) / 2 , which is 4.5. (In more technical terms, this interprets the median as the fully trimmed mid-range).
In statistics and probability theory, a median is a value separating the higher half from the lower half of a data sample, a population or a probability distribution. For a data set, it may be thought of as "the middle" value. For example, the basic advantage of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed so much by a small proportion of extremely large or small values, and so it may give a better idea of a "typical" value. For example, in understanding statistics like household income or assets, which vary greatly, the mean may be skewed by a small number of extremely high or low values. Median income, for example, may be a better way to suggest what a "typical" income is. Because of this, the median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median will not give an arbitrarily large or small result.
Formally, a median of a population is any value such that at most half of the population is less than the proposed median and at most half is greater than the proposed median. As seen above, medians may not be unique. If each set contains less than half the population, then some of the population is exactly equal to the unique median.
The median is well-defined for any ordered (one-dimensional) data, and is independent of any distance metric. The median can thus be applied to classes which are ranked but not numerical (e.g. working out a median grade when students are graded from A to F), although the result might be halfway between classes if there is an even number of cases.
A geometric median, on the other hand, is defined in any number of dimensions. A related concept, in which the outcome is forced to correspond to a member of the sample, is the medoid.
There is no widely accepted standard notation for the median, but some authors represent the median of a variable x either as x͂ or as μ1/2 sometimes also M. In any of these cases, the use of these or other symbols for the median needs to be explicitly defined when they are introduced.
The median is a special case of other ways of summarising the typical values associated with a statistical distribution: it is the 2nd quartile, 5th decile, and 50th percentile.
The median can be used as a measure of location when one attaches reduced importance to extreme values, typically because a distribution is skewed, extreme values are not known, or outliers are untrustworthy, i.e., may be measurement/transcription errors.
For example, consider the multiset
1, 2, 2, 2, 3, 14.
The median is 2 in this case, (as is the mode), and it might be seen as a better indication of the center than the arithmetic mean of 4, which is larger than all-but-one of the values. However, the widely cited empirical relationship that the mean is shifted "further into the tail" of a distribution than the median is not generally true. At most, one can say that the two statistics cannot be "too far" apart.
As a median is based on the middle data in a set, it is not necessary to know the value of extreme results in order to calculate it. For example, in a psychology test investigating the time needed to solve a problem, if a small number of people failed to solve the problem at all in the given time a median can still be calculated.
Because the median is simple to understand and easy to calculate, while also a robust approximation to the mean, the median is a popular summary statistic in descriptive statistics. In this context, there are several choices for a measure of variability: the range, the interquartile range, the mean absolute deviation, and the median absolute deviation.
For practical purposes, different measures of location and dispersion are often compared on the basis of how well the corresponding population values can be estimated from a sample of data. The median, estimated using the sample median, has good properties in this regard. While it is not usually optimal if a given population distribution is assumed, its properties are always reasonably good. For example, a comparison of the efficiency of candidate estimators shows that the sample mean is more statistically efficient when — and only when — data is uncontaminated by data from heavy-tailed distributions or from mixtures of distributions. Even then, the median has a 64% efficiency compared to the minimum-variance mean (for large normal samples), which is to say the variance of the median will be ~50% greater than the variance of the mean.