Methods of quantitative analysis: Estimation of confidence intervals. Confidence interval

The analysis of random errors is based on the theory of random errors, which makes it possible, with a certain guarantee, to calculate the actual value of the measured quantity and evaluate possible errors.

The basis of the theory of random errors is the following assumptions:

with a large number of measurements, random errors of the same magnitude, but of a different sign, occur equally often;

large errors are less common than small ones (the probability of an error decreases with an increase in its value);

with an infinitely large number of measurements, the true value of the measured quantity is equal to the arithmetic mean of all measurement results;

the appearance of one or another measurement result as a random event is described by the normal distribution law.

In practice, a distinction is made between a general and a sample set of measurements.

Under the general population imply the whole set of possible measurement values ​​or possible error values
.

For sample population number of measurements limited, and in each case strictly defined. They think that if
, then the average value of this set of measurements close enough to its true value.

1. Interval Estimation Using Confidence Probability

For a large sample and a normal distribution law, the general evaluation characteristic of the measurement is the variance
and coefficient of variation :

;
. (1.1)

Dispersion characterizes the homogeneity of a measurement. The higher
, the greater the measurement scatter.

The coefficient of variation characterizes variability. The higher , the greater the variability of the measurements relative to the mean values.

To assess the reliability of measurement results, the concepts of confidence interval and confidence probability are introduced into consideration.

Trusted is called the interval values , in which the true value falls measured quantity with a given probability.

Confidence Probability (reliability) of a measurement is the probability that the true value of the measured quantity falls within a given confidence interval, i.e. to the zone
. This value is determined in fractions of a unit or in percent.

,

where
- integral Laplace function ( table 1.1 )

The integral Laplace function is defined by the following expression:

.

The argument to this function is guarantee factor :

Table 1.1

Integral Laplace function

If, on the basis of certain data, a confidence probability is established (often taken to be
), then set accuracy of measurements (confidence interval
) based on the ratio

.

Half of the confidence interval is

, (1.3)

where
- argument of the Laplace function, if
(table 1.1 );

- Student's functions, if
(table 1.2 ).

Thus, the confidence interval characterizes the measurement accuracy of a given sample, and the confidence level characterizes the measurement reliability.

Example

Done
measurements of the strength of the pavement of a section of a highway with an average modulus of elasticity
and the calculated value of the standard deviation
.

Necessary determine the required accuracy measurements for different levels confidence level
, taking the values on table 1.1 .

In this case, respectively |

Therefore, for a given measurement tool and method, the confidence interval increases by about times if you increase just on
.

Confidence interval for mathematical expectation - this is such an interval calculated from the data, which with a known probability contains the mathematical expectation population. The natural estimate for the mathematical expectation is the arithmetic mean of its observed values. Therefore, further during the lesson we will use the terms "average", "average value". In problems of calculating the confidence interval, the answer most often required is "The confidence interval of the average number [value in a specific problem] is from [lower value] to [higher value]". With the help of the confidence interval, it is possible to evaluate not only the average values, but also the share of one or another feature of the general population. Averages, variance, standard deviation and the error through which we will come to new definitions and formulas are analyzed in the lesson Sample and Population Characteristics .

Point and interval estimates of the mean

If the average value of the general population is estimated by a number (point), then for the estimate of the unknown medium size of the general population, a specific mean is taken, which is calculated from a sample of observations. In this case, the value of the sample mean - a random variable - does not coincide with the mean value of the general population. Therefore, when indicating the mean value of the sample, it is also necessary to indicate the sample error at the same time. The measure of sampling error is standard error, which is expressed in the same units as the mean. Therefore, the following notation is often used: .

If the estimate of the mean is required to be associated with a certain probability, then the parameter of the general population of interest must be estimated not by a single number, but by an interval. A confidence interval is an interval in which, with a certain probability, P the value of the estimated indicator of the general population is found. Confidence interval in which with probability P = 1 - α is a random variable , is calculated as follows:

,

α = 1 - P, which can be found in the appendix to almost any book on statistics.

In practice, the population mean and variance are not known, so the population variance is replaced by the sample variance, and the population mean by the sample mean. Thus, the confidence interval in most cases is calculated as follows:

.

The confidence interval formula can be used to estimate the population mean if

  • the standard deviation of the general population is known;
  • or the standard deviation of the population is not known, but the sample size is greater than 30.

The sample mean is an unbiased estimate of the population mean . In turn, the sample variance is not an unbiased estimate of the population variance . To obtain an unbiased estimate of the population variance in the sample variance formula, the sample size n should be replaced with n-1.

Example 1 Information is collected from 100 randomly selected cafes in a certain city that the average number of employees in them is 10.5 with a standard deviation of 4.6. Determine the confidence interval of 95% of the number of cafe employees.

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

Thus, the 95% confidence interval for the average number of cafe workers was between 9.6 and 11.4.

Example 2 For a random sample from a general population of 64 observations, the following total values ​​were calculated:

sum of values ​​in observations ,

sum of squared deviations of values ​​from the mean .

Calculate the 95% confidence interval for the expected value.

calculate the standard deviation:

,

calculate the average value:

.

Substitute the values ​​in the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

We get:

Thus, the 95% confidence interval for the mathematical expectation of this sample ranged from 7.484 to 11.266.

Example 3 For a random sample from a general population of 100 observations, a mean value of 15.2 and a standard deviation of 3.2 were calculated. Calculate the 95% confidence interval for the expected value, then the 99% confidence interval. If the sample power and its variation remain the same, but the confidence factor increases, will the confidence interval narrow or widen?

We substitute these values ​​into the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

We get:

.

Thus, the 95% confidence interval for the average of this sample was from 14.57 to 15.82.

Again, we substitute these values ​​into the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,01 .

We get:

.

Thus, the 99% confidence interval for the average of this sample was from 14.37 to 16.02.

As you can see, as the confidence factor increases, the critical value of the standard normal distribution also increases, and, therefore, the start and end points of the interval are located further from the mean, and thus the confidence interval for the mathematical expectation increases.

Point and interval estimates of the specific gravity

The share of some feature of the sample can be interpreted as a point estimate of the share p the same trait in the general population. If this value needs to be associated with a probability, then the confidence interval of the specific gravity should be calculated p feature in the general population with a probability P = 1 - α :

.

Example 4 There are two candidates in a certain city A and B running for mayor. 200 residents of the city were randomly polled, of which 46% answered that they would vote for the candidate A, 26% - for the candidate B and 28% do not know who they will vote for. Determine the 95% confidence interval for the proportion of city residents who support the candidate A.

Confidence interval

Confidence interval- a term used in mathematical statistics for interval (as opposed to point) estimation statistical parameters, which is preferable when the sample size is small. The confidence interval is the interval that covers the unknown parameter with a given reliability.

The method of confidence intervals was developed by the American statistician Jerzy Neumann, based on the ideas of the English statistician Ronald Fischer.

Definition

Confidence interval parameter θ random variable distribution X with trust level 100 p%, generated by the sample ( x 1 ,…,x n), is called an interval with boundaries ( x 1 ,…,x n) and ( x 1 ,…,x n) which are realizations of random variables L(X 1 ,…,X n) and U(X 1 ,…,X n) such that

.

The boundary points of the confidence interval are called confidence limits.

An intuition-based interpretation of the confidence interval would be: if p is large (say 0.95 or 0.99), then the confidence interval almost certainly contains the true value θ .

Another interpretation of the concept of a confidence interval: it can be considered as an interval of parameter values θ compatible with experimental data and not contradicting them.

Examples

  • Confidence interval for the mathematical expectation of a normal sample ;
  • Confidence interval for the normal sample variance .

Bayesian Confidence Interval

In Bayesian statistics, there is a definition of a confidence interval that is similar but differs in some key details. Here, the estimated parameter itself is considered a random variable with some given a priori distribution (uniform in the simplest case), and the sample is fixed (in classical statistics, everything is exactly the opposite). The Bayesian-confidence interval is the interval covering the parameter value with the posterior probability:

.

Generally, classical and Bayesian confidence intervals are different. In the English-language literature, the Bayesian confidence interval is usually called the term credible interval, and the classic confidence interval.

Notes

Sources

Wikimedia Foundation. 2010 .

  • Baby (film)
  • Colonist

See what "Confidence Interval" is in other dictionaries:

    Confidence interval- the interval calculated from the sample data, which with a given probability (confidence) covers the unknown true value of the estimated distribution parameter. Source: GOST 20522 96: Soils. Methods of statistical processing of results ... Dictionary-reference book of terms of normative and technical documentation

    confidence interval- for a scalar parameter of the general population, this is a segment that most likely contains this parameter. This phrase is meaningless without further clarification. Since the boundaries of the confidence interval are estimated from the sample, it is natural to ... ... Dictionary of Sociological Statistics

    CONFIDENCE INTERVAL is a parameter estimation method that differs from point estimation. Let a sample x1, . be given. . ., xn from a distribution with a probability density f(x, α), and a*=a*(x1, . . ., xn) is the estimate α, g(a*, α) is the probability density of the estimate. Are looking for… … Geological Encyclopedia

    CONFIDENCE INTERVAL- (confidence interval) The interval in which the confidence of a parameter value for a population derived from a sample survey has a certain degree of probability, such as 95%, due to the sample itself. Width… … Economic dictionary

    confidence interval- is the interval in which the true value of the determined quantity is located with a given confidence probability. General chemistry: textbook / A. V. Zholnin ... Chemical terms

    Confidence interval CI- Confidence interval, CI * davyaralny interval, CI * confidence interval interval of the sign value, calculated for c.l. distribution parameter (e.g. the mean value of a feature) over the sample and with a certain probability (e.g. 95% for 95% ... Genetics. encyclopedic Dictionary

    CONFIDENCE INTERVAL- the concept that arises when estimating the parameter statistich. distribution by interval of values. D. i. for the parameter q corresponding to the given coefficient. confidence P, is equal to such an interval (q1, q2) that for any distribution of the probability of inequality ... ... Physical Encyclopedia

    confidence interval- - Telecommunication topics, basic concepts EN confidence interval ... Technical Translator's Handbook

    confidence interval- pasikliovimo intervalas statusas T sritis Standartizacija ir metrologija apibrėžtis Dydžio verčių intervalas, kuriame su pasirinktąja tikimybe yra matavimo rezultato vertė. atitikmenys: engl. confidence interval vok. Vertrauensbereich, m rus.… … Penkiakalbis aiskinamasis metrologijos terminų žodynas

    confidence interval- pasikliovimo intervalas statusas T sritis chemija apibrėžtis Dydžio verčių intervalas, kuriame su pasirinktąja tikimybe yra matavimo rezultatų vertė. atitikmenys: engl. confidence interval rus. trust area; confidence interval... Chemijos terminų aiskinamasis žodynas

Any sample gives only an approximate idea of ​​the general population, and all sample statistical characteristics (mean, mode, variance ...) are some approximation or say an estimate of the general parameters, which in most cases cannot be calculated due to the inaccessibility of the general population (Figure 20) .

Figure 20. Sampling error

But you can specify the interval in which, with a certain degree of probability, lies the true (general) value of the statistical characteristic. This interval is called d confidence interval (CI).

So the general average with a probability of 95% lies within

from to, (20)

where t - tabular value of Student's criterion for α =0.05 and f= n-1

Can be found and 99% CI, in this case t chosen for α =0,01.

What is the practical significance of a confidence interval?

    A wide confidence interval indicates that the sample mean does not accurately reflect the population mean. This is usually due to an insufficient sample size, or to its heterogeneity, i.e. large dispersion. Both give a large error in the mean and, accordingly, a wider CI. And this is the reason to return to the research planning stage.

    Upper and lower CI limits assess whether the results will be clinically significant

Let us dwell in more detail on the question of the statistical and clinical significance of the results of the study of group properties. Recall that the task of statistics is to detect at least some differences in general populations, based on sample data. It is the clinician's task to find such (not any) differences that will help diagnosis or treatment. And not always statistical conclusions are the basis for clinical conclusions. Thus, a statistically significant decrease in hemoglobin by 3 g/l is not a cause for concern. And, conversely, if some problem in the human body does not have a mass character at the level of the entire population, this is not a reason not to deal with this problem.

We will consider this position in example.

The researchers wondered if boys who had some kind of infectious disease were lagging behind their peers in growth. For this purpose, it was carried out sample study, in which 10 boys who had this disease took part. The results are presented in table 23.

Table 23. Statistical results

lower limit

upper limit

Specifications (cm)

middle

From these calculations, it follows that the selective average height of 10-year-old boys who have had some kind of infectious disease is close to normal (132.5 cm). However, the lower limit of the confidence interval (126.6 cm) indicates that there is a 95% probability that the true average height of these children corresponds to the concept of "short stature", i.e. these children are stunted.

In this example, the results of the confidence interval calculations are clinically significant.

Confidence interval are the limiting values ​​of the statistical quantity, which, with a given confidence probability γ, will be in this interval with a larger sample size. Denoted as P(θ - ε . In practice, choose confidence levelγ from the values ​​γ = 0.9, γ = 0.95, γ = 0.99 sufficiently close to unity.

Service assignment. This service defines:

  • confidence interval for the general mean, confidence interval for the variance;
  • confidence interval for the standard deviation, confidence interval for the general fraction;
The resulting solution is saved in a Word file (see example). Below is a video instruction on how to fill in the initial data.

Example #1. On a collective farm, out of a total herd of 1,000 sheep, 100 sheep were subjected to selective control shearing. As a result, an average wool shear of 4.2 kg per sheep was established. Determine with a probability of 0.99 the standard error of the sample in determining the average wool shear per sheep and the limits in which the shear value lies if the variance is 2.5. The sample is nonrepetitive.
Example #2. From the batch of imported products at the post of the Moscow Northern Customs, 20 samples of product "A" were taken in the order of random re-sampling. As a result of the check, the average moisture content of the product "A" in the sample was established, which turned out to be 6% with a standard deviation of 1%.
Determine with a probability of 0.683 the limits of the average moisture content of the product in the entire batch of imported products.
Example #3. A survey of 36 students showed that the average number of textbooks they read in academic year, turned out to be equal to 6. Assuming that the number of textbooks read by a student per semester has a normal distribution law with a standard deviation equal to 6, find: A) with a reliability of 0.99, an interval estimate for the mathematical expectation of this random variable; B) with what probability can it be argued that the average number of textbooks read by a student per semester, calculated for this sample, deviates from the mathematical expectation in absolute value by no more than 2.

Classification of confidence intervals

By the type of parameter being evaluated:

By sample type:

  1. Confidence interval for infinite sampling;
  2. Confidence interval for the final sample;
Sampling is called re-sampling, if the selected object is returned to the general population before choosing the next one. The sample is called non-repetitive. if the selected object is not returned to the general population. In practice, one usually deals with non-repeating samples.

Calculation of the mean sampling error for random selection

The discrepancy between the values ​​of indicators obtained from the sample and the corresponding parameters of the general population is called representativeness error.
Designations of the main parameters of the general and sample population.
Sample Mean Error Formulas
reselectionnon-repetitive selection
for middlefor sharefor middlefor share
The ratio between the sampling error limit (Δ) guaranteed with some probability P(t), and average error sample has the form: or Δ = t μ, where t– confidence coefficient, determined depending on the level of probability P(t) according to the table of the integral Laplace function.

Formulas for calculating the sample size with a proper random selection method