Variation rows. average values

The set of values ​​of the parameter studied in a given experiment or observation, ranked by magnitude (increase or decrease) is called a variation series.

Let's assume that we measured the blood pressure of ten patients in order to obtain an upper BP threshold: systolic pressure, i.e. only one number.

Imagine that a series of observations (statistical population) of arterial systolic pressure in 10 observations has the following form (Table 1):

Table 1

Components variation series are called options. Variants represent the numerical value of the trait being studied.

The construction of a variational series from a statistical set of observations is only the first step towards comprehending the features of the entire set. Next, it is necessary to determine the average level of the studied quantitative trait (the average level of blood protein, average weight patients, average time to onset of anesthesia, etc.)

The average level is measured using criteria that are called averages. Average value - generalizing numerical characteristic qualitatively homogeneous values, characterizing by one number the entire statistical population according to one attribute. The average value expresses the general that is characteristic of a trait in a given set of observations.

There are three types of averages in common use: mode (), median () and average arithmetic value ().

To define any medium size it is necessary to use the results of individual observations, writing them in the form of a variation series (Table 2).

Fashion- the value that occurs most frequently in a series of observations. In our example, mode = 120. If there are no repeating values ​​in the variation series, then they say that there is no mode. If several values ​​are repeated the same number of times, then the smallest of them is taken as the mode.

Median- the value dividing the distribution into two equal parts, the central or median value of a series of observations ordered in ascending or descending order. So, if there are 5 values ​​in the variational series, then its median is equal to the third member of the variational series, if there is an even number of members in the series, then the median is the arithmetic mean of its two central observations, i.e. if there are 10 observations in the series, then the median is equal to the arithmetic mean of 5 and 6 observations. In our example.

Note important feature modes and medians: their magnitudes are not affected by the numerical values ​​of the extremes.

Arithmetic mean calculated by the formula:

where is the observed value in the -th observation, and is the number of observations. For our case.

The arithmetic mean has three properties:

The middle one occupies the middle position in the variation series. In a strictly symmetrical row.

The average is a generalizing value and random fluctuations, differences in individual data are not visible behind the average. It reflects the typical that is characteristic of the entire population.

The sum of deviations of all variants from the mean is equal to zero: . The deviation of the variant from the mean is indicated.

The variation series consists of variants and their corresponding frequencies. Of the ten values ​​obtained, the number 120 was encountered 6 times, 115 - 3 times, 125 - 1 time. Frequency () - the absolute number of individual options in the population, indicating how many times this option occurs in the variation series.

The variation series can be simple (frequencies = 1) or grouped shortened, 3-5 options each. A simple series is used with a small number of observations (), grouped - with large numbers observations().

Statistical distribution series- this is an ordered distribution of population units into groups according to a certain varying attribute.
Depending on the trait underlying the formation of a distribution series, there are attribute and variation distribution series.

The presence of a common feature is the basis for the formation of a statistical population, which is the results of a description or measurement common features research objects.

The subject of study in statistics are changing (varying) features or statistical features.

Types of statistical features.

Distribution series are called attribute series. built on quality grounds. Attributive- this is a sign that has a name (for example, a profession: a seamstress, teacher, etc.).
It is customary to arrange the distribution series in the form of tables. In table. 2.8 shows an attribute series of distribution.
Table 2.8 - Distribution of types of legal assistance provided by lawyers to citizens of one of the regions of the Russian Federation.

Variation series are feature values ​​(or ranges of values) and their frequencies.
Variation series are distribution series built on a quantitative basis. Any variational series consists of two elements: variants and frequencies.
Variants are individual values ​​of a feature that it takes in a variation series.
Frequencies are the numbers of individual variants or each group of the variation series, i.e. these are numbers showing how often certain options occur in a distribution series. The sum of all frequencies determines the size of the entire population, its volume.
Frequencies are called frequencies, expressed in fractions of a unit or as a percentage of the total. Accordingly, the sum of the frequencies is equal to 1 or 100%. The variational series allows us to evaluate the form of the distribution law based on actual data.

Depending on the nature of the variation of the trait, there are discrete and interval variation series.
An example of a discrete variational series is given in Table. 2.9.
Table 2.9 - Distribution of families by the number of rooms occupied in individual apartments in 1989 in the Russian Federation.

The first column of the table presents variants of a discrete variation series, the second column contains the frequencies of the variation series, and the third column contains the frequency indicators.

Variation series

AT population some quantitative trait is being investigated. A sample of volume is randomly extracted from it n, that is, the number of elements in the sample is n. At the first stage of statistical processing, ranging samples, i.e. number ordering x 1 , x 2 , …, x n Ascending. Each observed value x i called option. Frequency m i is the number of observations of the value x i in the sample. Relative frequency (frequency) w i is the frequency ratio m i to sample size n: .
When studying a variational series, the concepts of cumulative frequency and cumulative frequency are also used. Let be x some number. Then the number of options , whose values ​​are less x, is called the accumulated frequency: for x i n is called the accumulated frequency w i max .
An attribute is called discretely variable if its individual values ​​(variants) differ from each other by some finite amount (usually an integer). A variational series of such a feature is called a discrete variational series.

Table 1. General view of the discrete variational series of frequencies

Feature valuesx i x 1 x2 x n
Frequenciesm i m 1 m2 m n

An attribute is called continuously varying if its values ​​differ from each other by an arbitrarily small amount, i.e. the sign can take any value in a certain interval. A continuous variation series for such a trait is called an interval series.

Table 2. General view of the interval variation series of frequencies

Table 3. Graphic images of the variation series

RowPolygon or histogramEmpirical distribution function
Discrete
interval
Looking at the results of the observations, it is determined how many values ​​of the variants fell into each specific interval. It is assumed that each interval belongs to one of its ends: either in all cases the left (more often), or in all cases the right, and the frequencies or frequencies show the number of options contained in the indicated boundaries. Differences a i – a i +1 are called partial intervals. To simplify subsequent calculations, the interval variation series can be replaced by a conditionally discrete one. In this case, the mean value i-th interval is taken as an option x i, and the corresponding interval frequency m i- for the frequency of this interval.
For graphic representation of variational series, polygon, histogram, cumulative curve and empirical distribution function are most often used.

In table. 2.3 (Grouping of the population of Russia according to the size of the average per capita income in April 1994) is presented interval variation series.
It is convenient to analyze the distribution series using a graphical representation, which also makes it possible to judge the shape of the distribution. A visual representation of the nature of the change in the frequencies of the variational series is given by polygon and histogram.
The polygon is used when displaying discrete variational series.
Let us depict, for example, graphically the distribution of housing stock by type of apartments (Table 2.10).
Table 2.10 - Distribution of the housing stock of the urban area by type of apartments (conditional figures).


Rice. Housing distribution polygon


On the y-axis, not only the values ​​of frequencies, but also the frequencies of the variation series can be plotted.
The histogram is taken to display the interval variation series. When constructing a histogram, the values ​​of the intervals are plotted on the abscissa axis, and the frequencies are depicted by rectangles built on the corresponding intervals. The height of the columns in the case of equal intervals should be proportional to the frequencies. A histogram is a graph in which a series is shown as bars adjacent to each other.
Let's graphically depict the interval distribution series given in Table. 2.11.
Table 2.11 - Distribution of families by the size of living space per person (conditional figures).
N p / p Groups of families by the size of living space per person Number of families with a given size of living space Accumulated number of families
1 3 – 5 10 10
2 5 – 7 20 30
3 7 – 9 40 70
4 9 – 11 30 100
5 11 – 13 15 115
TOTAL 115 ----


Rice. 2.2. Histogram of the distribution of families by the size of living space per person


Using the data of the accumulated series (Table 2.11), we construct distribution cumulative.


Rice. 2.3. The cumulative distribution of families by the size of living space per person


The representation of a variational series in the form of a cumulate is especially effective for variational series, the frequencies of which are expressed as fractions or percentages of the sum of the frequencies of the series.
If we change the axes in the graphic representation of the variational series in the form of a cumulate, then we get ogivu. On fig. 2.4 shows an ogive built on the basis of the data in Table. 2.11.
A histogram can be converted to a distribution polygon by finding the midpoints of the sides of the rectangles and then connecting these points with straight lines. The resulting distribution polygon is shown in fig. 2.2 dotted line.
When constructing a histogram of the distribution of a variational series with unequal intervals, along the ordinate axis, not the frequencies are plotted, but the distribution density of the feature in the corresponding intervals.
The distribution density is the frequency calculated per unit interval width, i.e. how many units in each group are per unit interval value. An example of calculating the distribution density is presented in Table. 2.12.
Table 2.12 - Distribution of enterprises by the number of employees (figures are conditional)
N p / p Groups of enterprises by the number of employees, pers. Number of enterprises Interval size, pers. Distribution density
BUT 1 2 3=1/2
1 up to 20 15 20 0,75
2 20 – 80 27 60 0,25
3 80 – 150 35 70 0,5
4 150 – 300 60 150 0,4
5 300 – 500 10 200 0,05
TOTAL 147 ---- ----

For a graphical representation of variation series can also be used cumulative curve. With the help of the cumulate (the curve of the sums), a series of accumulated frequencies is displayed. The cumulative frequencies are determined by successively summing the frequencies by groups and show how many units of the population have feature values ​​no greater than the considered value.


Rice. 2.4. Ogiva distribution of families according to the size of living space per person

When constructing the cumulate of an interval variation series, the variants of the series are plotted along the abscissa axis, and the accumulated frequencies along the ordinate axis.

​ Variation series - a series in which they are compared (in ascending or descending order) options and their respective frequencies

Variants are separate quantitative expressions of a trait. Designated with a Latin letter V . The classical understanding of the term "variant" assumes that each unique value of a feature is called a variant, regardless of the number of repetitions.

For example, in a variational series of indicators of systolic blood pressure measured in ten patients:

110, 120, 120, 130, 130, 130, 140, 140, 160, 170;

only 6 values ​​are options:

110, 120, 130, 140, 160, 170.

Frequency is a number indicating how many times an option is repeated. Denoted by a Latin letter P . The sum of all frequencies (which, of course, is equal to the number of all studied) is denoted as n.

    In our example, the frequencies will take on the following values:
  • for variant 110 frequency P = 1 (value 110 occurs in one patient),
  • for variant 120 frequency P = 2 (value 120 occurs in two patients),
  • for variant 130 frequency P = 3 (value 130 occurs in three patients),
  • for variant 140 frequency P = 2 (value 140 occurs in two patients),
  • for variant 160 frequency P = 1 (value 160 occurs in one patient),
  • for variant 170 frequency P = 1 (value 170 occurs in one patient),

Types of variation series:

  1. simple- this is a series in which each option occurs only once (all frequencies are equal to 1);
  2. suspended- a series in which one or more options occur repeatedly.

The variation series is used to describe large arrays of numbers; it is in this form that the collected data of most medical studies are initially presented. In order to characterize the variation series, special indicators are calculated, including average values, indicators of variability (the so-called dispersion), indicators of the representativeness of sample data.

Variation series indicators

1) The arithmetic mean is a generalizing indicator that characterizes the size of the studied trait. The arithmetic mean is denoted as M , is the most common type of average. The arithmetic mean is calculated as the ratio of the sum of the values ​​of the indicators of all units of observation to the number of all examined. The method for calculating the arithmetic mean differs for a simple and weighted variation series.

Formula for calculation simple arithmetic mean:

Formula for calculation weighted arithmetic mean:

M = Σ(V * P)/ n

​ 2) Mode - another average value of the variation series, corresponding to the most frequently repeated variant. Or, to put it differently, this is the option that corresponds to the highest frequency. Designated as Mo . The mode is calculated only for weighted series, since in simple series none of the options is repeated and all frequencies are equal to one.

For example, in the variation series of heart rate values:

80, 84, 84, 86, 86, 86, 90, 94;

the value of the mode is 86, since this variant occurs 3 times, therefore its frequency is the highest.

3) Median - the value of the option, dividing the variation series in half: on both sides of it there is an equal number of options. The median, as well as the arithmetic mean and mode, refers to average values. Designated as Me

4) Standard deviation (synonyms: standard deviation, sigma deviation, sigma) - a measure of the variability of the variation series. It is an integral indicator that combines all cases of deviation of a variant from the mean. In fact, it answers the question: how far and how often do the options spread from the arithmetic mean. Denoted by a Greek letter σ ("sigma").

When the population size is more than 30 units, the standard deviation is calculated using the following formula:

For small populations - 30 observation units or less - the standard deviation is calculated using a different formula:

Variation series is a series of numeric values ​​of a feature.

The main characteristics of the variation series: v - variant, p - the frequency of its occurrence.

Types of variation series:

    according to the frequency of occurrence of variants: simple - the variant occurs once, weighted - the variant occurs two or more times;

    options by location: ranked - options are arranged in descending and ascending order, unranked - options are written in no particular order;

    by grouping the option into groups: grouped - options are combined into groups, ungrouped - options are not grouped;

    by value options: continuous - options are expressed as an integer and a fractional number, discrete - options are expressed as an integer, complex - options are represented by a relative or average value.

A variational series is compiled and drawn up in order to calculate average values.

Variation series notation form:

8. Average values, types, calculation method, application in health care

Average values- the total generalizing characteristic of quantitative characteristics. Application of averages:

1. To characterize the organization of the work of medical institutions and evaluate their activities:

a) in the polyclinic: indicators of the workload of doctors, the average number of visits, the average number of residents in the area;

b) in a hospital: average number of bed days per year; average length of stay in hospital;

c) in the center of hygiene, epidemiology and public health: average area (or cubic capacity) per 1 person, average nutritional standards (proteins, fats, carbohydrates, vitamins, mineral salts, calories), sanitary norms and standards, etc.;

2. To characterize physical development (the main anthropometric features of morphological and functional);

3. To determine the medical and physiological parameters of the body in normal and pathological conditions in clinical and experimental studies.

4. In special scientific research.

The difference between average values ​​and indicators:

1. The coefficients characterize an alternative feature that occurs only in some part of the statistical team, which may or may not take place.

Average values ​​cover the signs inherent in all members of the team, but to varying degrees (weight, height, days of treatment in the hospital).

2. Coefficients are used to measure qualitative features. Average values ​​are for varying quantitative traits.

Types of averages:

    arithmetic mean, its characteristics - standard deviation and average error

    mode and median. Fashion (Mo)- corresponds to the value of the trait that is most often found in this population. Median (Me)- the value of the attribute, which occupies the median value in this population. It divides the series into 2 equal parts according to the number of observations. Arithmetic mean value (M)- unlike the mode and the median, it relies on all observations made, therefore it is an important characteristic for the entire distribution.

    other types of averages that are used in special studies: root mean square, cubic, harmonic, geometric, progressive.

Arithmetic mean characterizes the average level of the statistical population.

For a simple series where

∑v – sum option,

n is the number of observations.

for a weighted series, where

∑vr is the sum of the products of each option and the frequency of its occurrence

n is the number of observations.

Standard deviation arithmetic mean or sigma (σ) characterizes the diversity of the feature

- for a simple row

Σd 2 - the sum of the squares of the difference between the arithmetic mean and each option (d = │M-V│)

n is the number of observations

- for weighted series

∑d 2 p is the sum of the products of the squares of the difference between the arithmetic mean and each option and the frequency of its occurrence,

n is the number of observations.

The degree of diversity can be judged by the value of the coefficient of variation
. More than 20% - strong diversity, 10-20% - medium diversity, less than 10% - weak diversity.

If one sigma (M ± 1σ) is added to and subtracted from the arithmetic mean, then with a normal distribution, at least 68.3% of all variants (observations) will be within these limits, which is considered the norm for the phenomenon under study. If k 2 ± 2σ, then 95.5% of all observations will be within these limits, and if k M ± 3σ, then 99.7% of all observations will be within these limits. Thus, the standard deviation is a standard deviation that allows one to predict the probability of the occurrence of such a value of the studied trait that is within the specified limits.

Average error of the arithmetic mean or representativeness error. For simple, weighted series and by the rule of moments:

.

To calculate the average values, it is necessary: ​​the homogeneity of the material, a sufficient number of observations. If the number of observations is less than 30, n-1 is used in the formulas for calculating σ and m.

When evaluating the result obtained by the size of the average error, a confidence coefficient is used, which makes it possible to determine the probability of a correct answer, that is, it indicates that the obtained value of the sampling error will not be greater than the actual error made as a result of continuous observation. Consequently, with an increase in the confidence probability, the width of the confidence interval increases, which, in turn, increases the confidence of the judgment, the support of the result obtained.

The rows built by quantity, are called variational.

The distribution series consist of options(characteristic values) and frequencies(number of groups). Frequencies expressed as relative values ​​(shares, percentages) are called frequencies. The sum of all frequencies is called the volume of the distribution series.

By type, the distribution series are divided into discrete(built on discontinuous values ​​of the feature) and interval(built on continuous feature values).

Variation series represents two columns (or rows); one of which provides individual values ​​of the variable attribute, called variants and denoted by X; and in the other - absolute numbers showing how many times (how often) each option occurs. The indicators of the second column are called frequencies and are conventionally denoted by f. Once again, we note that in the second column, relative indicators characterizing the proportion of the frequency of individual variants in the total amount of frequencies can also be used. These relative indicators are called frequencies and conventionally denoted by ω The sum of all frequencies in this case is equal to one. However, frequencies can also be expressed as a percentage, and then the sum of all frequencies gives 100%.

If the variants of the variational series are expressed as discrete values, then such a variational series is called discrete.

For continuous features, variation series are constructed as interval, that is, the values ​​of the attribute in them are expressed “from ... to ...”. In this case, the minimum values ​​of the attribute in such an interval are called the lower limit of the interval, and the maximum - the upper limit.

Interval variational series are also built for discrete features that vary over a wide range. The interval series can be equal and unequal intervals.

Consider how the value of equal intervals is determined. Let us introduce the following notation:

i– interval value;

- the maximum value of the attribute for units of the population;

- the minimum value of the attribute for units of the population;

n- the number of allocated groups.

if n is known.

If the number of allocated groups is difficult to determine in advance, then the formula proposed by Sturgess in 1926 can be recommended to calculate the optimal size of the interval with a sufficient population size:

n = 1+ 3.322 log N, where N is the number of ones in the population.

The value of unequal intervals is determined in each individual case, taking into account the characteristics of the object of study.

The statistical distribution of the sample call the list of options and their corresponding frequencies (or relative frequencies).

The statistical distribution of the sample can be specified in the form of a table, in the first column of which there are options, and in the second - the frequencies corresponding to these options. ni, or relative frequencies Pi .

Statistical distribution of the sample

Variation series are called interval series, in which the values ​​of the features underlying their formation are expressed within certain limits (intervals). Frequencies in this case do not refer to individual values ​​of the attribute, but to the entire interval.

Interval distribution series are constructed according to continuous quantitative characteristics, as well as according to discrete characteristics, varying within a significant range.

The interval series can be represented by the statistical distribution of the sample, indicating the intervals and their corresponding frequencies. In this case, the sum of the frequencies of the variant that fell into this interval is taken as the frequency of the interval.

When grouping by quantitative continuous features, it is important to determine the size of the interval.

In addition to the sample mean and sample variance, other characteristics of the variation series are also used.

Fashion name the variant that has the highest frequency.