Subquery Predicates

In the WHERE clause, in addition to the listed predicates, expressions with the SELECT statement can also be used. Any expression that begins with a SELECT statement is a database query. If the expression contains at least one more SELECT statement, then it specifies a query nested in the first one. Subqueries are also called subqueries.

The subquery is a normal query, just like the ones discussed earlier. It returns a table (set of records), which, one way or another, is used to form the answer to the main query. So, for example, a subquery is used when, in order to select data in one table, it is necessary to perform checks against another table. The following special predicates are suitable for this purpose.


The predicates ALL ( all ), SOME ( some ), ANY ( any ) are actually quantifiers, known in mathematical logic as universal and existential quantifiers. ALL is a universal quantifier , and SOME and ANY, which are synonyms in SQL, are existential quantifiers . Note that in translation into Russian the word ANY should be understood as a universal quantifier (“any” means “all”), however, in English there are different meanings of this word. The use of the ALL keyword should be understood as “for all” or “for each”. The keywords SOME and ANY should be understood as “at least one”. However, in SQL, the keywords SOME and ANY have the same meaning, different from ALL.

Note. Expressions with the keywords ALL, SOME (ANY) correspond to logical expressions with quantifiers and, as such, may be called predicates.


Data processing often consists of several steps. So, first, some data sampling is performed, and then some manipulations are performed with it. However, when executing a select request, we cannot always be sure that the response contains at least one non-empty line. If the response to the request is empty, then it is pointless to perform further data processing. Thus, it is useful to know if the response to a request contains any data. This is what the EXISTS predicate is for. It becomes true only if the result table received in response to the query contains at least one record.


The UNIQUE ( unique ) predicate has the same meaning as EXISTS, but requires that all records in the result table not only exist, but must also be unique (i.e., not repeated) to be true.


The DISTINCT predicate is almost the same as UNIQUE. The difference between these predicates is found in relation to NULL values. Thus, if all records in the result table are unique (the UNIQUE predicate is true), then the DISTINCT predicate is also true (i.e., if all records are unique, then they are different). On the other hand, if the result table has at least two undefined records, then the DISTINCT predicate is false, although the UNIQUE predicate is true.


The OVERLAPS ( overlaps ) predicate is used to determine if two time intervals overlap. A time interval can be specified in two ways: as a start and end time, or as a start and duration. Examples of setting a time interval:

· (TIME ’12:25:30′, TIME ’14:30:00′) —interval specified by start and end times;

· (TIME ’12:45:00′, INTERVAL ‘2’ HOUR)—interval specified by start time and duration in hours.

An expression with the OVERLAPS predicate can be written, for example, like this:

(TIME ’12:25:30′, TIME ’14:30:00′) OVERLAPS (TIME 42:45:00′, INTERVAL ‘2’ HOUR)

Because the time intervals in this example overlap, the OVERLAPS predicate returns true.


The MATCH predicate is used to check that referential integrity is preserved when data is modified , i.e., when records are added, changed, and deleted.


The SIMILAR ( similar ) predicate is used to test for partial matching of character strings. The same problem can also be solved using the LIKE predicate, but in some cases SIMILAR is more efficient.

Suppose a table has an OS column containing the names of operating systems. You need to select records that match Windows NT, Windows XP, and Windows 98. Then you can use the following WHERE clause in your query expression:

WHERE OS SIMILAR TO ‘(Windows (NT|XP|98))’;

GROUP BY statement

The GROUP BY operator ( group by ) is used to group records by the values of one or more columns. If the SQL expression uses a WHERE clause that specifies a record filter, then the GROUP BY clause is found and executed after it. To determine which records should be in groups, use the HAVING clause, used in conjunction with GROUP BY. If the HAVING clause is not used, then all records filtered by the WHERE clause are grouped. If WHERE is not used, then all records in the source table are grouped.

Let’s say that based on a table about customers, you want to group data on the amount of customer orders by region. To do this, you can use the following SQL statement:

SELECT Region, Order_Amount FROM Customers GROUP BY Region;

On fig. Figure 4 shows the resulting table against the background of the original Clients table. Please note that entries with the same region names are located next to each other (in the same group).

Rice. 4. The result of the request for the amount of orders grouped by region

Rice. 5. The result of the request for total orders by region

To get a table in which order amounts are summarized by region, you will need to use the SUM () summary function and group by region:

SELECT Region, SUM(Order_Amount) FROM Customers

GROUP BY Region;

Here, in the SELECT statement, the usual column of the Customers table and the final function SUM () are specified, which calculates the sum of the values of the Order_Amount column. Since the grouping is specified by the Region column, the SUM (Order_Amount) function calculates the Sum of the values of the Order_Amount column for each value of the Region column. Figure 5 shows the resulting table against the background of the original Customers table. Note that the names of the regions are not repeated in this table.

The GROUP BY operator collects records into groups and arranges (sorts) the groups alphabetically (more precisely, by ASCII character codes). This circumstance should be borne in mind before deciding to use the ORDER BY sort operator.

HAVING statement

The HAVING operator ( having, provided ) is usually used in conjunction with the GROUP BY operator to filter records in groups. The rules for its formation are the same as for the WHERE clause.

Suppose that from the Customers table you want to select data on regions and order amounts, grouped by regions and those in which the order amount exceeds 500. In other words, you want to group data with a restriction on the records included in the groups. The query that performs this task looks like this:

SELECT Region, Order_Amount FROM Customers

GROUP BY Region, Order_Amount

HAVING Order_Sum > 500;

Rice. 6. Query result grouped by regions
and limit on the amount of orders

If there is no GROUP BY clause in the SQL expression, then the HAVING clause applies to all records returned by the WHERE clause. If WHERE is also missing, then HAVING affects all records in the table.

ORDER BY statement

The ORDER BY ( sort by ) operator is used to order (sort) records. If it is used in a request, then at the very end of the request. This operator sorts the rows of the entire table or its individual groups (in the case of using the GROUP BY operator). If there is no GROUP BY clause in the query expression, then the ORDER BY clause treats all records in the table as one group.

The ORDER BY keyword is followed by the column by whose values you want to sort. After the column name, you can specify a keyword that specifies the sort order (mode):

ASCascending (ascending). This value is the default, so if sorting is required, for example, in alphabetical order, then you do not need to specifically specify the order;

DESCdescending (descending).

If multiple sort columns are specified in the ORDER BY clause, the records are first ordered by the values of the first column, then for each value of the first column, the records are ordered by the values of the second column, and so on. The columns in the list are separated by commas, as usual. Thus, a hierarchical system for sorting the records of the result table is created.

The following example sorts the source customer table data by region and by customer name (Figure 7). In this case, sorting by customer names is performed in descending order, i.e., in the opposite order of alphabetical order.


ORDER BY Region, Name DESC;

Rice. 7. The result of sorting by regions in alphabetical order
and by customer names in reverse order

Logical operators

Logical expressions in WHERE and HAVING statements can be complex, i.e., consist of two or more simple expressions connected by AND and/or OR logical operators (unions). The AND operator acts as a logical union AND, and the OR operator acts as an OR union. So, if x and y are two logical expressions, then the compound expression x AND y evaluates to true (TRUE) only when x and y are both true; otherwise, the expression x and y evaluates to false (FALSE). The expression x OR y is true if at least one of the expressions, x or y, is true; if x and y are both false, then the compound expression x OR y is false.

The logical NOT operator applies to a single expression (possibly complex) to the right of it. This operator reverses the value of an expression. Thus, if x is true, then NOT x is false, and conversely, if x is false, then NOT x is true.

Let’s assume that from the Customers table (see Fig. 1) you want to return records about customers from Moscow and the North-West. The corresponding query looks like:

SELECT Region, Name, Order_Amount FROM Customers

WHERE Region=’Moscow’ OR Region=’Northwest’;

Please note that the logical operator OR (OR) is used here, and not AND (AND), since we need clients living either in Moscow or in the North-West. If we used AND instead of the OR operator, we would get an empty table, since there is not a single record in the original table in which the same column would have different values.

Attention. Be careful when formulating a query in natural language and when translating it into SQL.

The following SQL statement is equivalent to the one discussed earlier. It is based on the use of the IN operator:

SELECT Region, Name, Order_Amount FROM Customers

WHERE Region IN (‘Moscow’, ‘Northwest’);

If you want to get data on all clients who do not live in Moscow or the Northwest, then you can use the following SQL statement:

SELECT Region, Name, Order_Amount FROM Customers

WHERE NOT (Region=’Moscow’ OR Region=’Northwest’);

This expression is equivalent to the following two:

SELECT Region, Name, Order_Amount FROM Customers

WHERE Region < > ‘Moscow’ AND Region < > ‘North-West’;


SELECT Region, Name, Order_Amount FROM Customers

WHERE Region NOT IN (‘Moscow’, ‘Northwest’);


Select as the source table Customers shown in fig. 1. You can not copy its contents exactly, but create a similar table yourself. It is important that the table has character (text) and numeric columns. It would be nice if some columns would have the same values, for example, the Region column. In the tasks proposed below, it is required to form SQL expressions that provide a certain selection of records.

When forming the conditions for searching (selecting) records, try to abstract from specific data in the existing table, since the user can add new records and modify existing ones, doing it the way he wants and as the restrictions for this table allow. For example, the Address column is of character type. This means that it contains arbitrary character data, the structure of which is supported by the user at the semantic level. The fields in this column may or may not contain a postal code, and the street name may be preceded by “st.” or “street”, or nothing may be specified, the phone number may have different meanings, contain or not contain an area code, hyphens, etc. When formulating a request in such situations, you should take into account as many semantic nuances as possible.

Task 1

Select records that are grouped by region and exclude the Northwest region. Try to do it in at least two ways (using WHERE and HAVING).

Task 2

Select records for customers who live in cities that end in “burg” and have an order value greater than 2000.

Task 3

Select entries that have five-digit phone numbers. At the same time, it should be provided that telephone numbers may or may not contain hyphens, and the number itself may or may not contain an area code, enclosed or not enclosed in parentheses.

Be First to Comment

Leave a Reply

Your email address will not be published.