Query optimizer
From Biocrawler, the free encyclopedia.
The query optimizer is the component of a database management system that is used to analyze queries submitted to database server for execution, and then determine the optimal way to execute the query (Query plan).
The query optimizer cannot be accessed directly by users. Instead, once queries are submitted to database server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs.
Implementation
Most query optimizers represent query plans as a tree of "plan nodes". A plan node encapsulates a single operation that is required to execute the query. The nodes are arranged as a tree, in which intermediate results flow from the bottom of the tree to the top. Each node has zero or more child nodes -- those are nodes whose output is fed as input to the parent node. For example, a join node will have two child nodes, which represent the two join operands, whereas a sort node would have a single child node (the input to be sorted).
Join Optimization
Most query optimizers determine join order via a dynamic programming algorithm derived from the IBM System R database system. This algorithm works in two stages:
- First, all ways to access each relation (aka table) in the query are computed. Every relation in the query can be accessed via a sequential scan. If there is an index on a relation that can be used to answer a predicate in the query, an index scan can also be used. For each relation, the optimizer records the cheapest way to scan the relation, as well as the cheapest way to scan the relation that produces records in a particular sorted order.
- The optimizer then considers combining each pair of relations for which a join condition exists. For each pair, the optimizer will consider the available join algorithms implemented by the DBMS. It will preserve the cheapest way to join each pair of relations, in addition to the cheapest way to join each pair of relations that produces its output according to a particular sort order.
- Then all three-relation query plans are computed, by joining each two-relation plan produced by the previous phase with the remaining relations in the query.
In this manner, a query plan is eventually produced that joins all the queries in the relation. Note that the algorithm keeps track of the sort order of the result set produced by a query plan. This is to avoid a redundant sort operation later on in processing the query.
Historically, System-R derived query optimizers would often only consider left-deep query plans. This heuristic is drawn from the observation that join algorithms such as nested loops only require a single tuple (aka Row (database)) of the outer relation at a time. Therefore, a left-deep query plan means that fewer tuples need to be held in memory at any time: the outer relation's join plan need only be executed until a single tuple is produced, and then the inner base relation can be scanned (this technique is called "pipelining"). This heuristic reduces the number of plans that need to be considered, but may result in not considering the optimal query plan.
See also
---

