Standard deviation notation. Percent Standard Deviation

In this article, I will talk about how to find standard deviation. This material is extremely important for a full understanding of mathematics, so a math tutor should devote a separate lesson or even several to studying it. In this article, you will find a link to a detailed and understandable video tutorial that explains what the standard deviation is and how to find it.

standard deviation makes it possible to estimate the spread of values ​​obtained as a result of measuring a certain parameter. It is denoted by a symbol (Greek letter "sigma").

The formula for the calculation is quite simple. To find the standard deviation, you need to take the square root of the variance. So now you have to ask, “What is variance?”

What is dispersion

The definition of variance is as follows. Dispersion is the arithmetic mean of the squared deviations of values ​​from the mean.

To find the variance, perform the following calculations sequentially:

  • Determine the mean (simple mean arithmetic series values).
  • Then subtract the average from each of the values ​​​​and square the resulting difference (we got difference squared).
  • The next step is to calculate the arithmetic mean of the squares of the differences obtained (You can find out why exactly the squares are below).

Let's look at an example. Let's say you and your friends decide to measure the height of your dogs (in millimeters). As a result of measurements, you received the following height measurements (at the withers): 600 mm, 470 mm, 170 mm, 430 mm and 300 mm.

Let's calculate the mean, variance and standard deviation.

Let's find the average first. As you already know, for this you need to add all the measured values ​​\u200b\u200band divide by the number of measurements. Calculation progress:

Average mm.

So, the average (arithmetic mean) is 394 mm.

Now we need to define deviation of the height of each of the dogs from the average:

Finally, to calculate the variance, each of the obtained differences is squared, and then we find the arithmetic mean of the results obtained:

Dispersion mm 2 .

Thus, the dispersion is 21704 mm 2 .

How to find the standard deviation

So how now to calculate the standard deviation, knowing the variance? As we remember, take the square root of it. That is, the standard deviation is:

mm (rounded to the nearest whole number in mm).

Using this method, we found that some dogs (for example, Rottweilers) are very big dogs. But there are also very small dogs (for example, dachshunds, but you should not tell them this).

The most interesting thing is that the standard deviation carries useful information. Now we can show which of the obtained results of measuring growth are within the interval that we get if we set aside from the average (on both sides of it) the standard deviation.

That is, with the help of the standard deviation, we get a “standard” method that allows you to find out which of the values ​​is normal (statistical average), and which is extraordinarily large or, conversely, small.

What is Standard Deviation

But ... things will be a little different if we analyze sampling data. In our example, we considered the general population. That is, our 5 dogs were the only dogs in the world that interested us.

But if the data is a sample (values ​​chosen from a large population), then the calculations must be carried out differently.

If there are values, then:

All other calculations are made in the same way, including the determination of the average.

For example, if our five dogs are just a sample of a population of dogs (all dogs on the planet), we must divide by 4 instead of 5 namely:

Sample variance = mm 2 .

In this case, the standard deviation for the sample is equal to mm (rounded to the nearest whole number).

We can say that we made some "correction" in the case when our values ​​are just a small sample.

Note. Why exactly the squares of the differences?

But why do we take the squares of the differences when calculating the variance? Let's admit at measurement of some parameter, you received the following set of values: 4; 4; -4; -4. If we just add the absolute deviations from the mean (difference) between each other... negative values cancel each other out with the positive ones:

.

It turns out that this option is useless. Then maybe it's worth trying the absolute values ​​of the deviations (that is, the modules of these values)?

At first glance, it turns out not bad (the resulting value, by the way, is called the mean absolute deviation), but not in all cases. Let's try another example. Let the measurement result in the following set of values: 7; one; -6; -2. Then the mean absolute deviation is:

Blimey! We again got the result 4, although the differences have a much larger spread.

Now let's see what happens if we square the differences (and then take the square root of their sum).

For the first example, you get:

.

For the second example, you get:

Now it's a completely different matter! The root-mean-square deviation is the greater, the greater the spread of the differences ... which is what we were striving for.

In fact, in this method the same idea is used as in calculating the distance between points, only applied in a different way.

And from a mathematical point of view, the use of squares and square roots gives more value than we could get from the absolute values ​​of the deviations, making the standard deviation applicable to other mathematical problems.

Sergey Valerievich told you how to find the standard deviation

Mathematical expectation and variance

Let's measure a random variable N times, for example, we measure the wind speed ten times and want to find the average value. How is the mean value related to the distribution function?

Let's throw a dice a large number of once. The number of points that will fall out on the die during each throw is a random variable and can take any natural values ​​from 1 to 6. N it tends to a very specific number - the mathematical expectation Mx. In this case Mx = 3,5.

How did this value come about? Let in N Tests once dropped out 1 point, once - 2 points and so on. Then N→ ∞ the number of outcomes in which one point fell, Similarly, From here

Model 4.5. Dice

Let us now assume that we know the distribution law random variable x, that is, we know that the random variable x can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Expected value Mx random variable x equals:

Answer. 2,8.

The mathematical expectation is not always a reasonable estimate of some random variable. So, to estimate the average wages it is more reasonable to use the concept of a median, that is, such a value that the number of people receiving less than the median salary and more, are the same.

median a random variable is called a number x 1/2 such that p (x < x 1/2) = 1/2.

In other words, the probability p 1 that the random variable x will be less x 1/2 , and the probability p 2 that a random variable x will be greater x 1/2 are the same and equal to 1/2. The median is not uniquely determined for all distributions.

Back to the random variable x, which can take the values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

dispersion random variable x is the average value of the squared deviation of a random variable from its mathematical expectation:

Example 2

Under the conditions of the previous example, calculate the variance and standard deviation of a random variable x.

Answer. 0,16, 0,4.

Model 4.6. target shooting

Example 3

Find the probability distribution of the number of points rolled on the die from the first throw, the median, the mathematical expectation, the variance and the standard deviation.

Dropping any face is equally probable, so the distribution will look like this:

Standard deviation It can be seen that the deviation of the value from the mean value is very large.

Properties of mathematical expectation:

  • The mathematical expectation of the sum of independent random variables is equal to the sum of their mathematical expectations:

Example 4

Find the mathematical expectation of the sum and the product of the points rolled on two dice.

In example 3, we found that for one cube M (x) = 3.5. So for two cubes

Dispersion properties:

  • The variance of the sum of independent random variables is equal to the sum of the variances:

Dx + y = Dx + Dy.

Let for N dice rolls y points. Then

This result is not only true for dice rolls. In many cases, it determines the accuracy of measuring the mathematical expectation empirically. It can be seen that with an increase in the number of measurements N the spread of values ​​around the mean, that is, the standard deviation, decreases proportionally

The variance of a random variable is related to the mathematical expectation of the square of this random variable by the following relation:

Let us find the mathematical expectations of both parts of this equality. A-priory,

The mathematical expectation of the right side of the equality, according to the property of mathematical expectations, is equal to

Standard deviation

standard deviation equals the square root of the variance:
When determining the standard deviation for a sufficiently large volume of the studied population (n> 30), the following formulas are used:

Similar information.


It is defined as a generalizing characteristic of the size of the variation of a trait in the aggregate. It is equal to the square root of the average square of the deviations of the individual values ​​of the feature from the arithmetic mean, i.e. the root of and can be found like this:

1. For the primary row:

2. For a variation series:

The transformation of the standard deviation formula leads it to a form more convenient for practical calculations:

Standard deviation determines how much, on average, specific options deviate from their average value, and besides, it is an absolute measure of the trait fluctuation and is expressed in the same units as the options, and therefore is well interpreted.

Examples of finding the standard deviation: ,

For alternative features, the formula for the standard deviation looks like this:

where p is the proportion of units in the population that have a certain attribute;

q - the proportion of units that do not have this feature.

The concept of mean linear deviation

Average linear deviation is defined as the arithmetic mean of the absolute values ​​of the deviations of individual options from .

1. For the primary row:

2. For a variation series:

where the sum of n is the sum of the frequencies of the variation series.

An example of finding the average linear deviation:

The advantage of the mean absolute deviation as a measure of dispersion over the range of variation is obvious, since this measure is based on taking into account all possible deviations. But this indicator has significant drawbacks. Arbitrary discarding of algebraic signs of deviations can lead to the fact that mathematical properties this indicator are far from elementary. This greatly complicates the use of the mean absolute deviation in solving problems related to probabilistic calculations.

Therefore, the average linear deviation as a measure of the variation of a feature is rarely used in statistical practice, namely when the summation of indicators without taking into account signs has economic sense. With its help, for example, the turnover is analyzed foreign trade, the composition of workers, the rhythm of production, etc.

root mean square

RMS applied, for example, to calculate the average size of the sides of n square sections, the average diameters of trunks, pipes, etc. It is divided into two types.

The root mean square is simple. If, when replacing the individual values ​​of a feature with average value it is necessary to keep the sum of squares of the original values ​​\u200b\u200bconstant, then the average will be the quadratic average.

It is the square root of the quotient of the sum of squares of individual feature values ​​divided by their number:

The mean square weighted is calculated by the formula:

where f is a sign of weight.

Average cubic

Average cubic applied, for example, when determining the average side length and cubes. It is divided into two types.
Average cubic simple:

When calculating averages and variances in interval series distribution, the true values ​​of the attribute are replaced by the central values ​​of the intervals, which are different from the arithmetic mean of the values ​​included in the interval. This leads to a systematic error in the calculation of the variance. V.F. Sheppard determined that error in variance calculation, caused by applying the grouped data, is 1/12 of the square of the interval value, both upward and downward in the magnitude of the variance.

Sheppard Amendment should be used if the distribution is close to normal, refers to a feature with a continuous nature of variation, built on a significant amount of initial data (n> 500). However, based on the fact that in a number of cases both errors, acting in different directions, compensate each other, it is sometimes possible to refuse to introduce amendments.

How less value dispersion and standard deviation, the more homogeneous the population and the more typical the average will be.
In the practice of statistics, it often becomes necessary to compare variations of various features. For example, it is of great interest to compare variations in the age of workers and their qualifications, length of service and wages, cost and profit, length of service and labor productivity, etc. For such comparisons, indicators of the absolute variability of characteristics are unsuitable: it is impossible to compare the variability of work experience, expressed in years, with the variation of wages, expressed in rubles.

To carry out such comparisons, as well as comparisons of the fluctuation of the same attribute in several populations with different arithmetic mean, a relative indicator of variation is used - the coefficient of variation.

Structural averages

To characterize the central trend in statistical distributions, it is often rational to use, together with the arithmetic mean, a certain value of the attribute X, which, due to certain features of its location in the distribution series, can characterize its level.

This is especially important when the extreme values ​​of the feature in the distribution series have fuzzy boundaries. In this regard, the exact determination of the arithmetic mean, as a rule, is impossible or very difficult. In such cases, the average level can be determined by taking, for example, the value of the feature that is located in the middle of the frequency series or that occurs most often in the current series.

Such values ​​depend only on the nature of the frequencies, i.e., on the structure of the distribution. They are typical in terms of location in the frequency series, therefore such values ​​are considered as characteristics of the distribution center and therefore have been defined as structural averages. They are used to study internal structure and structure of series of distribution of attribute values. These indicators include .

Carrying out any statistical analysis unthinkable without calculations. In this article, we will look at how to calculate the variance, standard deviation, coefficient of variation and other statistical indicators in Excel.

Maximum and minimum value

Average linear deviation

The average linear deviation is the average of the absolute (modulo) deviations from in the analyzed data set. The mathematical formula looks like:

a is the average linear deviation,

X- analyzed indicator,

- the average value of the indicator,

n

In Excel this function is called SROTCL.

After selecting the SIRT function, we specify the data range for which the calculation should take place. Click "OK".

Dispersion

(module 111)

Perhaps not everyone knows what is, so I will explain - this is a measure that characterizes the spread of data around the mathematical expectation. However, there is usually only a sample available, so the following variance formula is used:

s2 is the sample variance calculated from observational data,

X– individual values,

is the arithmetic mean over the sample,

n is the number of values ​​in the analyzed data set.

Relevant Excel functionDISP.G. When analyzing relatively small samples (up to about 30 observations), you should use , which is calculated by the following formula.

The difference, apparently, is only in the denominator. Excel has a function to calculate the sample unbiased variance DISP.V.

Select the desired option (general or selective), specify the range, click the "OK" button. The resulting value may be very large due to the preliminary squaring of the deviations. The variance in statistics is a very important indicator, but it is usually used not in pure form, and for further calculations.

Standard deviation

Standard deviation (RMS) is the root of the variance. This indicator is also called the standard deviation and is calculated by the formula:

by general population

by sample

You can just take the root of the variance, but in Excel for standard deviation there are ready-made functions: STDEV.G and STDEV.B(for the general and sample population, respectively).

Standard and standard deviation, I repeat, are synonyms.

Next, as usual, specify the desired range and click on "OK". The standard deviation has the same units of measurement as the analyzed indicator, therefore it is comparable with the original data. More on that below.

The coefficient of variation

All the indicators discussed above are linked to the scale of the initial data and do not allow one to get a figurative idea of ​​the variation of the analyzed population. To obtain a relative measure of data scatter, use the coefficient of variation, which is calculated by dividing standard deviation on the average. The formula for the coefficient of variation is simple:

To calculate the coefficient of variation in Excel, there is no ready-made function, which is not a big problem. The calculation can be made by simply dividing the standard deviation by the mean. To do this, in the formula bar, write:

STDEV.G()/AVERAGE()

The data range is indicated in parentheses. If necessary, use the standard deviation for the sample (STDEV.B).

The coefficient of variation is usually expressed as a percentage, so a cell with a formula can be framed with a percentage format. The desired button is located on the ribbon on the "Home" tab:

You can also change the format by selecting from the context menu after selecting the desired cell and clicking the right mouse button.

The coefficient of variation, unlike other indicators of the spread of values, is used as an independent and very informative indicator of data variation. In statistics, it is generally accepted that if the coefficient of variation is less than 33%, then the data set is homogeneous, if more than 33%, then it is heterogeneous. This information can be useful for a preliminary description of the data and for identifying opportunities for further analysis. In addition, the coefficient of variation, measured as a percentage, makes it possible to compare the degree of dispersion of different data, regardless of their scale and units of measurement. Useful property.

Oscillation factor

Another measure of data scatter today is the oscillation coefficient. This is the ratio of the range of variation (the difference between the maximum and minimum values) to the mean. There is no ready-made Excel formula, so you have to put together three functions: MAX, MIN, AVERAGE.

The oscillation coefficient indicates the degree of variation relative to the mean, which can also be used to compare different datasets.

In general, with the help of Excel, many statistical indicators are calculated very simply. If something is not clear, you can always use the search box in the function insert. Well, Google to the rescue.

The Excel program is highly valued by both professionals and amateurs, because a user of any level of training can work with it. For example, anyone with minimal skills of "communication" with Excel can draw a simple graph, make a decent sign, etc.

At the same time, this program even allows you to perform various kinds of calculations, for example, calculation, but this already requires a slightly different level of training. However, if you have just started a close acquaintance with this program and are interested in everything that will help you become a more advanced user, this article is for you. Today I will tell you what the standard deviation formula in excel is, why it is needed at all and, in fact, when it is applied. Go!

What it is

Let's start with theory. The standard deviation is usually called the square root, obtained from the arithmetic mean of all squared differences between the available values, as well as their arithmetic mean. By the way, this value is usually called the Greek letter "sigma". Standard deviation is calculated by the formula STDEV, respectively, the program does it for the user itself.

The point is this concept is to reveal the degree of variability of the instrument, that is, it is, in its own way, an indicator that comes from descriptive statistics. It reveals changes in the volatility of the instrument in any time period. Using STDEV formulas, you can estimate the standard deviation of a sample, while boolean and text values ​​are ignored.

Formula

Helps to calculate the standard deviation in excel formula, which is automatically provided in Excel program. To find it, you need to find the formula section in Excel, and already there select the one that has the name STDEV, so it's very simple.

After that, a window will appear in front of you in which you will need to enter data for the calculation. In particular, two numbers should be entered in special fields, after which the program will automatically calculate the standard deviation for the sample.

Undoubtedly mathematical formulas and calculations is a rather complicated issue, and not all users can deal with it right off the bat. However, if you dig a little deeper and understand the issue a little more in detail, it turns out that not everything is so sad. I hope you are convinced of this by the example of calculating the standard deviation.

Video to help