The second stage of any statistical observation is a summary of observation materials.
The summary is a set of actions to streamline and process primary statistical materials in order to identify typical features and patterns of the studied phenomena and processes inherent in the studied phenomenon as a whole.
The task of the summary is to characterize the subject under study with the help of a system of statistical indicators, to identify and measure its essential features and patterns.
There is a summary in the narrow and broad sense of the word. A summary in the narrow sense of the word is understood as a mechanical systematization of data, i.e. calculation of group and total results. A summary in the broad sense of the word is understood as a meaningful process of generalizing the results of observation, i.e. in the broad sense of the word, the summary includes not only the calculation of group and total results, but also the grouping of population units, the characteristics of these groups by a system of indicators, the construction of tables and graphs.
The success of a statistical summary largely depends on its program and plan.
The statistical summary program contains:
– a list of groups into which the population will be divided;
– the boundaries of the allocated groups according to the established characteristics of the grouping;
– a system of indicators used to characterize the population as a whole and its individual parts and the methodology for their calculation;
– system of layouts of development tables.
The plan of the summary, along with the program, provides for its organization. The plan specifies:
– the sequence and timing of the implementation of individual parts of the report;
– performers of certain types of work;
– the order in which the summary results are presented.
The main element of the statistical summary is the statistical grouping. Grouping is not only the main thing, it is the main thing that allows you to systematize the material of observation, prepare it for analysis and, in the end, reveal some patterns of the studied phenomena and processes.
Groupings and their types
Grouping is the distribution of units of a population according to one or more essential characteristics into homogeneous groups that differ from each other in qualitative and quantitative terms and make it possible to single out socio-economic types, study the structure of the population, or analyze the relationships between individual characteristics.
Groupings, unlike other statistical methods, perform two functions: firstly, they are an independent method of cognizing socio-economic phenomena and processes; secondly, they are a technique that predetermines the boundaries and possibilities of using other statistical methods (averages, analysis of variance, CRA, etc.).
When grouping population units, the following requirements must be observed:
– the number of units in groups should be sufficient to obtain reliable characteristics;
– units in the formed groups should be statistically homogeneous in terms of grouping;
– the selected groups should differ significantly from each other in terms of the size of the grouping characteristic.
Groups are classified according to various criteria:
1. According to the number of grouping features , groupings are distinguished between simple and complex.
Simple (one-dimensional) groupings are carried out according to one criterion (distribution of students by gender).
Complex (combination, multidimensional) is the result of grouping according to two or more characteristics.
If groups formed according to one attribute are then divided into subgroups according to the second, and so on. characteristics taken in combination, then such a grouping is called combinational (distribution of students by gender and marital status).
Multidimensional grouping is based on measuring the similarity or difference between objects (units): units assigned to one group (class) differ less from each other than units assigned to different groups (classes). Various criteria (Euclidean distance, etc.) serve as a measure of proximity (similarity) between objects. An example of such a grouping is the distribution of banks by a number of indicators used to assess the bank’s rating.
2. According to the nature of the grouped material , groupings are distinguished primary and secondary.
Primary grouping is a direct grouping of statistical observation data.
Secondary grouping is a regrouping of previously grouped data. The need for secondary grouping arises in the following cases: firstly, if the previously produced grouping does not meet the objectives of the study in terms of the number of groups; secondly, to compare data relating to different territories, if the primary grouping was carried out according to different grouping characteristics or at different intervals.
3. Depending on the type of tasks being solved, there are typological, structural and analytical groupings.
Typological grouping is called, leading to the allocation of socio-economic types. Examples of typological groupings are: the distribution of the employed population of Ukraine by areas of activity,
distribution of enterprises by form of ownership, etc.
A grouping is called structural , the purpose of which is the division of units of a homogeneous population into groups that characterize its structure according to certain characteristics. Structural groupings are those that characterize the structure of students by age, scholarship amount, etc.
Analytical grouping is carried out according to two or more features and aims to identify the relationship between the features being studied (for example, establishing a relationship between the size of the scholarship and the average score in the previous session). When analytical grouping, first of all, you need to choose some two signs (indicators), in which one depends on the other. A sign (indicator) on which another depends is called a sign-factor (average score in a session). A sign that depends on a factor is called a dependent (resulting) sign (the size of the scholarship).
When applying the grouping method, the following main questions are solved:
1 Selection of a grouping attribute.
2 Determining the number of groups into which the population will be divided.
3. Establish boundaries formed by groups.
Let’s take a closer look at how each of these issues is addressed.
Groups are formed either by qualitative (sex, marital status, form of ownership, etc.) or by quantitative (age, income, etc.) characteristics.
When grouping according to a qualitative (attributive) attribute, the number of groups and their name is determined by the content of the grouping attribute itself. The boundaries of the intervals of such a grouping are set in such a way as to achieve the ultimate goal of the grouping, namely, within groups, the units must be qualitatively homogeneous, but the groups must differ from each other. It is necessary to find such values of the attribute levels, the transition through which means the transition to a different socio-economic type. A limited number of groups can be formed on the basis of gender, marital status, education, etc.
If an attributive sign has a limited number of names (occupation, form of ownership, etc.), then classifications are resorted to.
Classification is a generally accepted methodological standard for dividing the population into homogeneous groups, established for a certain period of time (for example, the classification of forms of ownership, types of economic activity, etc.). It is not always possible to draw a clear line between classification and grouping, since they perform the same type of functions. In classification, in contrast to grouping, grouping characteristics are predetermined, and the question of their choice disappears; the requirements and conditions for classifying population units to a particular group are clearly formulated.
When grouping according to a quantitative attribute, the number of groups and the boundaries of the formed groups depend: firstly, on the limits of variation of the grouping attribute; secondly, on the number of units of the studied population. Relationship between the number of groups ( n ) and the number of population units
(N) is expressed by the Sturgess formula: n = 1 + 3.322 lgN. This dependence can serve as a guideline in determining the number of groups in the event that the distribution of units of the population according to this characteristic approaches normal and equal intervals in groups are applied. Based on the Sturgess formula, the number of groups and the number of population units are related as follows:
When determining the number of groups, it is necessary to strive to ensure that the features of the phenomenon under study do not disappear. Practice shows that the number of groups should not be very large, but not very small either. It should also be remembered that a sufficiently large number of population units must fall into each group.
When deciding on the number of units in groups, one must be guided not by formal considerations, but by knowledge of the essence of the phenomenon under study. If there are few groups, then within each group there will be clearly heterogeneous units. If the number of groups is relatively large, then there may be groups with a small number of units (non-representative).
After determining the number of groups when grouping by quantitative characteristics, the question arises about the size of the interval.
The interval is the difference between the largest and smallest values of features in each group. Depending on the nature of the distribution of population units according to the trait under study, groups can be distinguished with equal and unequal intervals.
Equal intervals are used when the variation of a trait manifests itself within relatively narrow boundaries and the distribution of population units is more or less uniform (for example, the distribution of students by age, workers by length of service, etc.). The value of the interval ( h ) when grouping with equal intervals is determined by the formula:
where x max and x min are the maximum and minimum values, respectively
grouping sign; n is the number of groups.
It is technically more convenient to deal with equal intervals, but this is far from always possible due to the properties of the phenomena and processes being studied. Therefore, often in studies, when the variation of a trait is carried out unevenly and within very wide limits, they resort to unequal intervals (progressively increasing or specialized), which is due to the nature of the phenomena being studied. An example of such groupings is the distribution of enterprises by the amount of debt to the budget, banks by assets, etc.
Intervals can be closed (lower and upper boundaries are indicated) and open (one of the group boundaries is indicated). Open intervals apply only to extreme groups. When grouping at unequal intervals, it is desirable to form groups with closed intervals.
Consider the grouping technique on a specific example.
Task. There are data for 20 stores in the city for the reporting period:
|shop no.||Retail turnover, thousand UAH||Distribution costs, thousand UAH.||Store No.||Retail turnover, thousand UAH||Distribution costs, thousand UAH.|
To study the relationship between the size of turnover and distribution costs, group stores according to the size of turnover, forming five groups of stores at equal intervals. Present the grouping results in the form of a table. For each group and as a whole, calculate:
1) the number of stores;
2) the size of the turnover – total and on average per store;
3) circulation costs – total and on average per store;
4) the relative level of distribution costs (the share of distribution costs in the total volume of retail turnover).
Draw your own conclusions. Plot the distribution series graphically as a histogram and cumulate.
Let’s group the stores according to the size of the turnover, forming five groups with equal intervals. The interval width is determined by the formula:
, where – the minimum and maximum value of the quantity
– number of groups.
For a task = 5; = 825 (thousand UAH); = 200 (thousand UAH). The width of the interval will be:
Adding the width of the interval to the minimum value, we determine the upper bounds of each group:
I 200 + 125 = 325 (thousand UAH)
II 325 + 125 = 450 (thousand UAH)
III 450 + 125 = 575 (thousand UAH)
IY 575 + 125 = 700 (thousand UAH)
Y 700 + 125 = 825 ( thousand UAH )
Let’s compile and fill in an auxiliary development table to calculate the number of stores, the size of retail turnover and distribution costs in each of the groups * (Table 3).
|Groups of stores by the size of retail turnover, thousand UAH.||Shop number||Retail turnover, thousand UAH||Distribution costs, thousand UAH.|
|I 200 – 325|
|II 325 – 450|
|III 450 – 575|
|IY 575 – 700|
|Y 700 – 825|
The results of the table will be entered into a general group table, in which, along with the calculated indicators, we will determine the size of the turnover and distribution costs on average per store, as well as the relative level of distribution costs.
Distribution of stores by size of retail turnover
|Groups of stores by the size of retail turnover with UAH||Number of stores||Retail turnover, thousand UAH||Distribution costs, thousand UAH.||Relative level of distribution costs|
|Total||average per store||Total||average per store|
|BUT||3 = 2 : 1||5 = 4 : 1||6=(4:2)x100|
|200 – 325||241.00||16.33||6.78|
|325 – 450||401.00||29.25||7.29|
|450 – 575||523.00||33.17||6.34|
|575 – 700||634.40||39.20||6.18|
|700 – 825||780.50||41.50||5.32|
*When summarizing from cards (chips), the result can be entered into statistical tables directly, bypassing auxiliary tables.
The grouping of city stores according to the size of retail turnover shows that with an increase in the volume of retail turnover, the relative level of distribution costs decreases. Thus, in the group of stores with the highest turnover, the share of distribution costs in the volume of retail turnover is the smallest.
For a graphical representation of the interval distribution series in the form of a histogram and cumulates, we fill in the auxiliary table 5:
Distribution of city stores by retail size
|Groups of stores by the size of the retail turnover with UAH||Number of stores||Cumulative accumulated number of stores|
|units||in % of the total|
|200 – 325|
|325 – 450||7 = ( 3 + 4 )|
|450 – 575||13 = ( 7 + 6 )|
|575 – 700||18 = ( 13 + 5 )|
|700 – 825||20 = ( 18 +2 )|
Fig.1. Histogram of city store distribution by size
Fig.2. The cumulative distribution of city stores by the size of retail turnover.