Menu
Simba Technologies
Simba Technologies

SimbaEngine X SDK 10.1.3
Developing Drivers for Data Stores Without SQL

Algebraic Expression Tree and Optimization

Before it executes an SQL statement, the Simba SQLEngine can pass a representation of the SQL statement, called an Algebraic Expression Tree, or AE-Tree, to your DSI implementation. The SQL statement takes this form before the Simba SQLEngine transforms it into an execution plan and executes it.

The AE-Tree is optimized in a three-step process, the first two of which are handled internally by the Simba SQLEngine. First, tables are re-ordered within the query, then operations are pushed-down in the AE-Tree, and finally operations are passed down to the DSII via Collaborative Query Execution (CQE). This section briefly reviews the three steps to give developers using the Simba SQLEngine some insight into the whole optimization process. Although CQE is only involved in the third step, it is useful to first understand the first two steps that the Simba SQLEngine performs during optimization.

Note:

The AE-Tree diagrams shown in this section were generated by the SQLEngine logs that are controlled by the DSIEXT_DATAENGINE_LOG_AETREES data engine property. This property includes values for logging in four different locations, each of which corresponds to a step in the optimization process as that have been outlined above. These AE-Tree logs can be used to help understand what is going on in each case.

Step 1 – Table Reordering

Cross joins, which can result when a query is made against multiple tables, are a very common occurrence. Thus an important optimization is to attempt to change them into INNER JOINs since passing down cross joins later on could negate other optimizations. To accomplish this, the tables are first re-ordered within the AE-Tree, to allow for SELECT nodes with a child CROSS-JOIN to be converted into an INNER JOIN which can eventually be passed-down via CQE. The conversion to an INNER JOIN in itself reduces the number of rows that will be returned. Note that additional optimizations based on table statistics may be added in the future.

Consider the following example query:

SELECT EMPLOYEE.FIRST_NAME FROM EMPLOYEE, ADDRESS, DEPT WHERE EMPLOYEE.DEPT=DEPT.DEPT_ID

Before optimization, the AE-Tree looks as follows:

AEQuery

AEProject

AESelect

AECrossJoin

AECrossJoin

AETable: DBF.Shop.EMPLOYEE

AETable: DBF.Shop.ADDRESS

AETable: DBF.schema.DEPT

AEComparison: EQ

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.DEPT

AEValueList

AEColumn: DBF.schema.DEPT.DEPT_ID

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

In its current form, the AESelect node can’t be pushed down in a later optimization as the condition involves a table in another cross-join, and thus we can’t turn the AESelect->AECrossJoin relation into an inner join as the two tables involved in the filter are at different levels. After re-ordering the tables, we have the following AETree:

AEQuery

AEProject

AESelect

AECrossJoin

AECrossJoin

AETable: DBF.Shop.EMPLOYEE

AETable: DBF.schema.DEPT

AETable: DBF.Shop.ADDRESS

AEComparison: EQ

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.DEPT

AEValueList

AEColumn: DBF.schema.DEPT.DEPT_ID

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

You can see the ADDRESS and DEPT tables have swapped positions, and now the AESelect can be pushed down to involve only the two required tables, which allows the AESelect operation to be turned into an inner join which can be passed-down. Conceptually, this would be the same as the following query:

select EMPLOYEE.FIRST_NAME from DEPT, EMPLOYEE, ADDRESS WHERE EMPLOYEE.DEPT= DEPT.DEPT_ID

Step 2 – Push Down Optimization

The second step involves push-down optimizations where by filters are pushed down the AE Tree by the SQLEngine to their lowest possible location. This is not to be confused with pass-down optimization (CQE) which is the third and final step.

Push down optimization allows for data to be filtered out at the earliest possible time, reducing work for the SQLEngine. In addition, AESelect->AECrossJoin relationships are transformed into AEJoin nodes if possible. Other optimizations may also be performed during this phase, but the push down is the most significant.

To demonstrate pushing down of a simple filter, consider the following query:

select EMPLOYEE.FIRST_NAME from EMPLOYEE, ADDRESS where EMPLOYEE.FIRST_NAME = ‘Susan’

which has an AETree that looks like this before push-down:

AEQuery

AEProject

AESelect

AECrossJoin

AETable: DBF.Shop.EMPLOYEE

AETable: DBF.Shop.ADDRESS

AEComparison: EQ

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

AEValueList

AELiteral: Susan; Character String Literal

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

You can see that the AESelect is above the AECrossJoin, which means the condition would be applied to the result of the cross-join, and thus be applied to many more rows than was needed. After push-down the AETree looks as follows:

AEQuery

AEProject

AECrossJoin

AESelect

AETable: DBF.Shop.EMPLOYEE

AEComparison: EQ

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

AEValueList

AELiteral: Susan; Character String Literal

AETable: DBF.Shop.ADDRESS

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

Here you can see that the AESelect has been pushed down so the condition is only applied to the EMPLOYEE table, and the cross-join is then applied to the filtered EMPLOYEE result and the ADDRESS table, resulting in many fewer rows being processed.

To demonstrate pushing down of filters which cause an AESelect->AECrossJoin relationship to become an AEJoin, we can continue the example from the table re-order phase from above which had an AETree that looked like this:

AEQuery

AEProject

AESelect

AECrossJoin

AECrossJoin

AETable: DBF.Shop.EMPLOYEE

AETable: DBF.schema.DEPT

AETable: DBF.Shop.ADDRESS

AEComparison: EQ

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.DEPT

AEValueList

AEColumn: DBF.schema.DEPT.DEPT_ID

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

After the push-down phase, the tree looks like this:

AEQuery

AEProject

AECrossJoin

AEJoin: AE_INNER_JOIN

AETable: DBF.Shop.EMPLOYEE

AETable: DBF.schema.DEPT

AEComparison: EQ

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.DEPT

AEValueList

AEColumn: DBF.schema.DEPT.DEPT_ID

AETable: DBF.Shop.ADDRESS

AEValueList

AEColumn: DBF.Shop.EMPLOYEE.FIRST_NAME

What has happened here is that the AESelect was pushed down one level as the filter condition only applied to the EMPLOYEE and the DEPT table, so the ADDRESS table could be left out of the filtering. Once this happened, there was an AESelect->AECrossJoin relationship where the condition for the AESelect only involved the two child tables, and could be turned into an AEJoin.

Note that the transformation of an AESelect->AECrossJoin into an AEJoin only occurs after the initial push-down of filters occurs. This is to allow filters that might apply only to one operand of the AECrossJoin to be pushed down directly to that operand and not be applied at the join level.

Step 3 – Pass Down Optimization (CQE)

The third and final step is pass down optimization in which the Simba SQLEngine passes portions of the AE Tree to the pass down handlers implemented in your DSI. When the Simba SQLEngine passes a subtree to your DSI handlers, these handlers can choose to execute all or part of the operation reflected in that subtree.

For instance, if the data store can filter data, join data, or execute aggregate functions particularly fast, pass down handlers can be used to signal to the Simba SQLEngine that the DSII will handle all or part of the operation. In this case the handler will return an optimized result set which the Simba SQLEngine will update the AE Tree subtree with.

Note:

  • The terms Pass-down and CQE are used interchangeably.
  • Although the Simba SQLEngine passes the AE Tree to the handlers, handlers should not directly manipulate the tree. Direct manipulation should be done by implementing an IQueryExecutor as described in Pre Optimization Analysis of the AE-Tree.

Before discussing these pass down handlers, it’s important to review the four main types of AENodes in an AE Tree as these will be used as input to the various handlers: Statements; Boolean; Query Operations and Relational Expressions; and Values. Each of these is explained below.

Statements

The root node of the tree representing the type of operation being performed: Query, Procedure Call, or DML. The first two are represented by nodes of typeAEQuery andAEProcedureCall respectively, while DML statements are represented by statement nodes which subclassAERowCountStatement.

Boolean

A logical expression representing a true or false outcome. The most common use of this is for the WHERE clause of a SELECT statement.The base class of this type of node isAEBooleanExpr.

Query Operations and Relational Expressions

A representation of retrieval or manipulation of relational data such as selecting from a table. The base class of this type of node isAEQueryOperation. These nodes represent operations on the entire query result, such as sorting.

AERelationalExpr, which derives fromAEQueryOperation, is the base class of most other nodes of this type. They represent retrieval, filtering, or modification of some relational data. Typically, they take one or two other relational expressions as an operand.

Three such examples are:

  • AETable – This represents the retrieval of data from a table in your data store.
  • AESelect – This represents the WHERE clause of a query by taking another relational expression and combining it with a boolean expression to use as a filter. The operand may be as simple as an AETable or a complicated combination of many other relational expressions.
  • AEJoin – This represents any join of two tables or relational expressions and an optional boolean expression to use as a join condition.

Values

A representation of a scalar value that is either a literal, a parameter, a column reference, or an expression composed of one or more other values. The base class of this type of node isAEValueExpr. The basic value expression nodes from which other values are built from areAELiteral, AEParameter, andAEColumn. Most arithmetic value expressions derive fromAEUnaryValueExpr orAEBinaryValueExpr to represent an operation on one or two value expression operands.

 

Related Links

Overview of Collaborative Query Execution

Pass-Down Operation Handlers