Classification and structuring of information

Makarova N.V., Volkov V.B. Informatics: a textbook for universities. – St. Petersburg: Peter, 2011. 576 p.

Topic 1. PRESENTATION OF INFORMATION

The concept of information

The term “information” comes from the Latin “informatio”, which means “clarification”, “information”, “exposition”.

There are many definitions of information. So, one of the founders of modern information theory, Nobert Wiener, defined information as follows: “Information is information, not matter or energy.”

Such a definition through negation seems quite complete and universal, but it is practically impossible to use it as a tool for building scientific methodology.

At the same time, methodological approaches have become widespread in modern technology, allowing the use of the concept of information and the proposed tools to study the processes occurring in technical systems, the economy, society, in living and inanimate nature.

The most famous among such approaches is the mathematical theory of Claude Shannon, which makes it possible to probabilistically substantiate the reliability of signal transmission over a communication line. In Shannon’s approach, information is a measure of reducing the uncertainty of a system.

There is also a thermodynamic (energy) approach that considers information as a way to reduce the entropy of the system.

The Soviet mathematician Kolmogorov proposed an algorithmic approach that makes it possible to evaluate information by the complexity of the algorithm required for its processing. All these approaches closely linked the concept of information with the scope.

From the standpoint of materialistic philosophy, information is a reflection of the real world with the help of information (messages). A message is a form of information representation in the form of speech, text, images, digital data, graphs, tables, etc. In a broad sense, information is a general scientific concept that includes the exchange of information between people, the exchange of signals between animate and inanimate nature, people and devices.

Information is information about objects and phenomena of the environment, their parameters, properties and state, which reduce the degree of uncertainty and incompleteness of existing knowledge.

Informatics considers information as conceptually interconnected information, concepts that change our ideas about a phenomenon or object in the world. Along with information in computer science, the concept of data is often used. Let’s show what their difference is.

The data can be considered as signs or recorded observations, which for some reason are not used, but only stored. In the event that it becomes possible to use this data to reduce the uncertainty of knowledge about something, the data turns into information.

Data is information encoded in a certain way for the purpose of transmission, processing, retrieval or retrieval.

Example. Write ten phone numbers on a piece of paper as a sequence of ten numbers and show them to your friend. He will perceive these figures as data, since they do not provide him with any information. Then, next to each number, indicate the name of the company and the type of activity. For your friend, incomprehensible numbers will gain certainty and turn from data into information that he could later use.

When working with information, there is always its source and consumer (recipient). Ways and processes that ensure the transmission of messages from the source of information to its consumer are called information communications.

For the consumer of information, a very important characteristic is its adequacy.

The adequacy of information is a certain level of correspondence of the image created with the help of the received information to a real object, process, phenomenon, etc.

In real life, a situation is hardly possible when you can count on the complete adequacy of information. There is always some degree of uncertainty. The correctness of decision-making by a person depends on the degree of adequacy of information to the real state of an object or process.

Measures of information (p. 20-25)

Information quality

The quality of information is a set of properties that determine the ability of information to satisfy certain needs of people.

The main consumer indicators of information quality are: representativeness, content, sufficiency, accessibility, relevance, timeliness, accuracy, reliability, stability.

The representativeness of information is related to the correctness of its selection and formation in order to adequately reflect the properties of the object. The most important thing here is the correctness of the concept on the basis of which the original concept is formulated; the validity of the selection of essential features and relationships of the displayed phenomenon.

Violation of the representativeness of information often leads to its significant errors.

The meaningfulness of information reflects the semantic capacity, which is equal to the ratio of the amount of semantic information in the message to the amount of data being processed. With an increase in the content of information, the semantic capacity of the information system increases, since in order to obtain the same information, it is necessary to convert a smaller amount of data.

The sufficiency (completeness) of information means that its composition (set of indicators) is minimal, but sufficient to make the right decision. The concept of completeness of information is connected with its semantic content (semantics) and pragmatics. Both incomplete, that is, insufficient for making the right decision, and redundant information reduces the effectiveness of decisions made by the user.

The availability of information to the perception of the user is ensured by the implementation of the appropriate procedures for its receipt and transformation. For example, in an information system, information is converted to an accessible and user-friendly form.

The relevance of information is determined by the degree of preservation of the value of information for management at the time of use and depends on the dynamics of changes in its characteristics, as well as on the time interval that has elapsed since the occurrence of this information.

The timeliness of information means its receipt no later than a predetermined point in time, consistent with the time of solving the task.

The accuracy of information is determined by the degree of closeness of the information received to the real state of the object, process, phenomenon, etc. For information displayed by a digital code, four classification concepts of accuracy are known:

– formal accuracy is measured by the value of the unit of the least significant digit of the number;

– the real accuracy is determined by the value of the unit of the last digit of the number, the correctness of which is guaranteed;

– maximum accuracy is the accuracy that can be obtained under the specific operating conditions of the system;

– the required accuracy is determined by the functional purpose of the indicator.

The reliability of information is determined by its ability to reflect real-life objects with the required accuracy. The reliability of information is measured by the confidence level of the required accuracy, that is, the probability that the parameter value displayed by the information differs from the true value of this parameter within the required accuracy.

The stability of information reflects its ability to respond to changes in the source data without violating the necessary accuracy. The stability of information, as well as representativeness, is due to the chosen method of its selection and formation.

Representativeness, meaningfulness, sufficiency, accessibility, sustainability, are entirely determined at the methodological level of information systems development. The parameters of relevance, timeliness, accuracy and reliability are also largely determined at the methodological level, but their value is also significantly affected by the nature of the system’s functioning, primarily its reliability. At the same time, the parameters of relevance and accuracy are rigidly connected, respectively, with the parameters of timeliness and reliability.

Information processes

The processes associated with the search, storage, transmission, processing and use of information are called informational.

Information retrieval is the process of retrieving stored information.

The collection of information is the activity of the subject, during which he receives information about the object of interest to him.

Information storage is the process of maintaining the original information in a form that ensures the issuance of data at the request of end users in a timely manner.

The method of storing information depends on its carrier (a book is a library, a picture is a museum, a photograph is an album). A computer can be considered as a device for compact storage of information with the ability to quickly access it.

Transfer (exchange) of information is a process during which the transmitter (source) transmits information, and the recipient (receiver) receives it.

In the process of transmitting information, a source and a receiver of information are necessarily involved. Between the source and the receiver there is a channel for transmitting information – a communication channel.

A communication channel is a set of technical devices that ensure the transmission of a signal from a source to a recipient.

An encoder is a device designed to convert the original source message to a form convenient for transmission.

A decoding device is a device for converting an encoded message into the original one (Fig. 1.1).

The activity of people is always connected with the transfer of information. During transmission, information can be lost and distorted, examples include sound distortion in a telephone, atmospheric interference in radio, image distortion or blackout in television, transmission errors in telegraph.

Interference

Rice. 1.1. Transfer of information over a communication channel

Message transmission channels are characterized by bandwidth and noise immunity. Data transmission channels are divided into simplex (with information transfer in one direction, for example, television) and duplex (through which information can be transmitted in both directions, for example, telephone, telegraph). Multiple messages can be transmitted simultaneously on a channel. Each of these messages is highlighted (separated from others) using special filters. For example, it is possible to filter by the frequency of transmitted messages, as is done in radio channels. The bandwidth of a channel is determined by the maximum number of symbols transmitted over it in the absence of interference. This characteristic depends on the physical properties of the channel. To increase the noise immunity of the channel, special message transmission methods are used that reduce the effect of noise. For example, extra characters are entered. These characters have no real content, but are used to validate the message when it is received. From the point of view of information theory, everything that makes the literary language colorful, flexible, rich in nuances, multifaceted, polysemantic is redundant.

Information processing is an ordered process of its transformation in accordance with an algorithm for solving a problem or with other formal rules.

After solving the problem of information processing, the result must be issued to end users in the required form. This operation is implemented in the course of solving the problem of issuing information. The issuance of information, as a rule, is carried out with the help of external computer devices in the form of texts, tables.

Information protection in a narrower sense is understood as preventing access to information by persons who do not have the appropriate permission (unauthorized, illegal access), unintentional or unauthorized use, modification or destruction of information.

Information security (in the broad sense) is a set of organizational, legal and technical measures to prevent threats to information security and eliminate their consequences.

The most effective means of organizing information processes is an information system equipped with input, search, placement, processing and delivery of information. The presence of such tools is the main feature of information systems, which distinguishes them from simple accumulations of information materials. For example, a personal library, in which only its owner can navigate, is not an information system. In public libraries, however, the order in which books are placed is always strictly defined. Thanks to him, the search and issue of books, as well as the placement of new acquisitions are implemented in the form of standard, formalized procedures.

Classification and structuring of information

Classification is a system for distributing objects (objects, phenomena, processes, concepts) into classes in accordance with a certain attribute.

Example. All information about the university can be classified according to numerous information objects, which will be characterized by common properties:

– information about students – in the form of an information object “Student”;

– information about teachers – in the form of an information object “Teacher”;

– information about faculties – in the form of an information object “Faculty”, etc.

The properties of an information object are defined by information parameters called attributes. Details are presented either as numerical data (for example, weight, cost, year) or as attributes (for example, color, car brand, last name).

A prop is a logically indivisible information element that describes a certain property of an object, process, or phenomenon.

Example. Information about each student in the personnel department of the university is systematized and presented using the same details:

– Full Name;

– floor;

– year of birth;

– Place of Birth;

– address of residence;

– the faculty where the student is studying, etc.

All the listed details characterize the properties of the information object “Student”.

In addition to identifying the general properties of an information object, classification is needed to develop rules (algorithms) and procedures for processing information represented by a set of details.

Example. The algorithm for processing information objects of the library fund allows you to get information about all books on a specific topic, about authors, subscribers, etc.

The algorithm for processing information objects of the company allows you to obtain information about sales volumes, profits, customers, types of products, etc.

Processing algorithms in both cases pursue different goals, process different information, and are implemented in different ways.

In any country, state, industry and regional classifiers have been developed and are being used. For example, industries, equipment, professions, units of measure, cost items, etc. are classified.

The classifier is a systematized set of names and codes of classification groups.

When classifying, the concepts of “classification feature” and “classification feature value” are widely used, which allow you to establish the degree of similarity or difference between objects. An approach to classification is possible with the combination of these two concepts into one, called the attribute of classification. A synonym for a classification feature is the division base.

Example. As a classification feature, age is selected, which consists of three values: up to 20 years, from 20 to 30 years, over 30 years. It is possible to use age up to 20 years, age from 20 to 30 years, age over 30 years as signs of classification.

Three methods of object classification have been developed: hierarchical, facet, descriptor. These methods differ in different strategies for applying classification features.

Any classification is always relative. The same object can be classified according to different features or criteria. Often there are situations when, depending on the environmental conditions, an object can be assigned to different classification groups. These considerations are especially relevant when classifying the types of information without taking into account its subject orientation, since it can often be used in different conditions, by different consumers, for different purposes.

In table. 1.1 shows one of the schemes for classifying information circulating in an organization (firm). The classification is based on the five most common features: place of origin, processing stage, display method, stability, control function.

Table 1.1. Classification of information circulating in the organization

Information
By place of origin By stability By processing stage By way of display By control function
input Variable Primary Text Planned
day off Constant Secondary Graphic Reference
Internal Intermediate accounting
External Result Operational

On the basis of the place of occurrence, information can be divided into input, output, internal, external.

Input information is information that enters the firm or its divisions. Output information is information coming from a firm to another firm, organization (department).

One and the same information can be input for one firm, and for another, which produces it, output. In relation to the control object (firm or its subdivision: workshop, department, laboratory), both internal and external information can be determined.

Internal information occurs inside the object, external – outside the object.

Example. The content of the government decree on changing the level of taxes levied for the company is, on the one hand, external information, on the other hand, it is input. The company’s information submitted to the tax inspectorate on the amount of deductions to the state budget is, on the one hand, output information, on the other hand, it is external in relation to the tax inspectorate.

According to the processing stage, information can be primary, secondary, intermediate, result.

Primary information is information that arises directly in the process of the object’s activity and is recorded at the initial stage. Secondary information is information that is obtained as a result of processing primary information; it can be intermediate and result. Intermediate information is used as input for subsequent calculations. The resulting information is obtained in the process of processing primary and intermediate information and serves to develop management decisions.

Example . In the art workshop where cups are painted, at the end of each shift, the total number of products produced and the number of cups painted by each employee are recorded. This is primary information. At the end of each month, the master sums up the primary information. On the one hand, this is secondary intermediate information, and on the other, it is resultant. The final data is sent to the accounting department, where the wages of each employee are calculated depending on their output. The obtained calculated data is the result information.

According to the method of displaying information is divided into textual and graphical.

Text information is a set of alphabetic, numeric and special characters with which information is presented on a physical medium (paper, image on a display screen). Graphic information is various kinds of graphs, diagrams, diagrams, drawings, etc.

In terms of stability, information can be variable (current) and constant (conditionally constant).

Variable information reflects the actual quantitative and qualitative characteristics of the production and economic activities of the company. It can vary for each case, both in purpose and in quantity. For example, the number of products produced per shift, weekly costs for the delivery of raw materials, the number of serviceable machines, etc. Permanent (conditionally permanent) information is information that is permanent and reusable over a long period of time. Permanent information can be reference, regulatory, planned:

– permanent reference information includes a description of the permanent properties of the object in the form of signs that are stable for a long time (for example: employee’s personnel number, employee’s profession, workshop number, etc.);

– permanent regulatory information contains local, industry and national regulations (for example: the amount of income tax, the standard for the quality of products of a certain type, the minimum wage, the pay scale for civil servants);

– permanent planning information contains planned indicators that are reused in the company (for example: a plan for the production of televisions, a plan for training specialists of a certain qualification).

According to management functions, economic information is usually classified, and the following groups are distinguished: planned, reference, accounting and operational (current).

Planned information – information about the parameters of the control object for the future period. This information is the focus of all activities of the company.

Example. The company’s planned information can include such indicators as a production plan, planned profit from sales, expected demand for products, etc.

Regulatory reference information is a variety of regulatory and reference data. It is rarely updated.

Example. Regulatory reference information at the enterprise are:

– the time intended for the manufacture of a typical part (labor rate);

– the average daily wage of a worker by category;

– salary of an employee;

– address of the supplier or buyer, etc.

Accounting information is information that characterizes the activities of the company for a certain past period of time. Based on this information, the following actions can be taken: planned information is adjusted, an analysis of the company’s business activities is made, decisions are made on more efficient work management, etc. In practice, accounting information, statistical information and operational accounting information can act as accounting information.

Example. Accounting information is: the number of products sold for a certain period of time; average daily load or downtime of machines, etc.

Operational (current) information is information used in operational management and characterizing production processes in the current (given) period of time. Serious requirements are imposed on operational information in terms of the speed of receipt and processing, as well as the degree of its reliability. The success of the company in the market largely depends on how quickly and efficiently it is processed.

Example. Operational information includes:

– the number of manufactured parts per hour, shift, day;

– the number of products sold per day or a certain hour;

– the volume of raw materials from the supplier at the beginning of the working day, etc.

Be First to Comment

Leave a Reply

Your email address will not be published.