庖丁解牛-图解MySQL 8.0优化器查询解析篇
一 背景和架构
Setup and Resolve
-
setup_tables : Set up table leaves in the query block based on list of tables.
-
resolve_placeholder_tables/merge_derived/setup_table_function/setup_materialized_derived : Resolve derived table, view or table function references in query block.
-
setup_natural_join_row_types : Compute and store the row types of the top-most NATURAL/USING joins.
-
setup_wild : Expand all '*' in list of expressions with the matching column references.
-
setup_base_ref_items : Set query_block's base_ref_items.
-
setup_fields : Check that all given fields exists and fill struct with current data.
-
setup_conds : Resolve WHERE condition and join conditions.
-
setup_group : Resolve and set up the GROUP BY list.
-
m_having_cond->fix_fields : Setup the HAVING clause.
-
resolve_rollup : Resolve items in SELECT list and ORDER BY list for rollup processing.
-
resolve_rollup_item : Resolve an item (and its tree) for rollup processing by replacing items matching grouped expressions with Item_rollup_group_items and updating properties (m_nullable, PROP_ROLLUP_FIELD). Also check any GROUPING function for incorrect column.
-
setup_order : Set up the ORDER BY clause.
-
resolve_limits : Resolve OFFSET and LIMIT clauses.
-
Window::setup_windows1: Set up windows after setup_order() and before setup_order_final().
-
setup_order_final: Do final setup of ORDER BY clause, after the query block is fully resolved.
-
setup_ftfuncs : Setup full-text functions after resolving HAVING.
-
resolve_rollup_wfs : Replace group by field references inside window functions with references in the presence of ROLLUP.
二 详细转换过程
1 传递null到join的内表列表(propagate_nullability)
2 解析设置查询块的leave_tables(setup_tables)
SELECT
t1.c1
FROM t1,
(SELECT
t2.c1
FROM t2,
(SELECT
t3.c1
FROM t3
UNION
SELECT
t4.c1
FROM t4) AS t3a) AS t2a;
3 解析查询块Derived Table、View、Table函数 (resolve_placeholder_tables)
merge_derived 处理和合并Derived table
1)merge_derived transformation的先决条件
-
外层query block是否允许merge(allow_merge_derived)
-
外层query block为nullptr
-
外层query expression的子查询为nullptr,derived table是第一层子查询
-
外层的外层query block可以allow_merge_derived=true,或者不包括外层的外层query block话是否为SELECT/SET
-
外层lex是否可以支持merge(lex->can_use_merged()+lex->can_no_use_merged())
-
derived table是否已经被标记为需要物化materialize,比如创建视图的方法是CREATE ALGORITHM=TEMPTABLE VIEW(derived_table->algorithm == VIEW_ALGORITHM_TEMPTABLE)
-
整个dervived table所在的查询表达式单元中,不能是(Query_expression::is_mergeable() ):
-
Union查询 -
包含聚集、HAVING、DISTINCT、WINDOWS或者LIMIT
-
没有任何table list
-
HINT或者optimizer_switch没有禁止derived_merge;
-
heuristic建议合并(derived_query_expressionmerge_heuristic());
-
如果derived table包含的子查询SELECT list依赖于自己的列时,不支持;
-
如果是dependant subquery需要多次执行时,不支持;
-
derived table中如果查询块包含SEMI/ANTI-JOIN,并指定STRAIGHT_JOIN时,不支持;
-
如果合并的derived table和现有query block的leaf table count大约 MAX_TABLES时,不支持;
2)merge_derived transformation的转换过程
-
利用derived_table->nested_join结构来辅助处理OUTER JOIN的情况。
-
把derived table中的表merge到NESTED_JOIN结构体(derived_table->merge_underlying_tables())。
-
将derived table中的所有表连接到父查询的table_list列表中,同时把derived table从父查询中删除。
-
对父查询的所有相关数据结构进行重新计算(leaf_table_count、derived_table_count、table_func_count、materialized_derived_table_count、has_sj_nests、has_aj_nests、partitioned_table_count、cond_count、between_count、select_n_having_items)。
-
传播设置父查询OPTION_SCHEMA_TABLE(add_base_options())和如果是外查询JOIN的内表,传播设置nullable属性(propagate_nullability())。
-
合并derived table的where条件到外查询中(merge_where())。
-
建立对derived table需要获取的列的引用(create_field_translation())。
-
将Derived table的结构从父查询中删除(exclude_level())。
-
将derived table中的列或者表的重命名合并到父查询(fix_tables_after_pullout()/repoint_contexts_of_join_nests())。
-
因为已经把derived table中包含的表merge到了父查询,所以需要对TABLE_LIST中的表所在的位置进行重新定位(remap_tables())。
-
将derived table合并到父查询之后,需要重新修改原来derived table中所有对derived table中所有列的引用(fix_tables_after_pullout())。
-
如果derived table中包含ORDER BY语句,如果满足下列条件,derived table将会保留ORDER BY并合并到父查询中,其他情况ORDER BY将会被忽略掉:
-
如果父查询允许排序并且正好是只有derived table -
不是一个UNION
-
可以有WHERE条件,但是不能有group by或聚合函数 -
本身并不是有序的
merge_derived 图解过程
trace_derived.add_utf8_table(derived_table)
.add("select#", derived_query_block->select_number)
.add("merged", true);
trace_derived.add_alnum("transformations_to_derived_table", "removed_ordering");
setup_materialized_derived 设置物化Derived Table
-
setup_materialized_derived_tmp_table(): 设置一个临时表包含物化Derived Table的所有行数据。
-
check_materialized_derived_query_blocks(): 设置属于当前Derived Table所在的查询块结构。
trace_derived.add_utf8_table(this)
.add("select#", derived->first_query_block()->select_number)
.add("materialized", true);
setup_table_function 处理表函数
trace_derived.add_utf8_table(this)
.add_utf8("function_name", func_name, func_name_len)
.add("materialized", true);
4 将SELECT *的通配符展开成具体的fields(setup_wild)
5 建立Query_block级别的base_ref_items(setup_base_ref_items)
6 对select_fields进行fix_fields()和列权限检查(setup_fields)
7 解析和fixed_fields WHERE条件和Join条件(setup_conds)
8 解析和设置ROLLUP语句(resolve_rollup)
SELECT YEAR,
country,
product,
SUM(profit) AS profit
FROM sales
GROUP BY YEAR,
country,
product WITH ROLLUP;
+------+---------+------------+--------+
| year | country | product | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer | 1500 |
| 2000 | Finland | Phone | 100 |
| 2000 | Finland | NULL | 1600 |
| 2000 | India | Calculator | 150 |
| 2000 | India | Computer | 1200 |
| 2000 | India | NULL | 1350 |
| 2000 | USA | Calculator | 75 |
| 2000 | USA | Computer | 1500 |
| 2000 | USA | NULL | 1575 |
| 2000 | NULL | NULL | 4525 |
| 2001 | Finland | Phone | 10 |
| 2001 | Finland | NULL | 10 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 2700 |
| 2001 | USA | TV | 250 |
| 2001 | USA | NULL | 3000 |
| 2001 | NULL | NULL | 3010 |
| NULL | NULL | NULL | 7535 |
+------+---------+------------+--------+
相当于做了下面的查询:
SELECT *
FROM
(SELECT YEAR,
country,
product,
SUM(profit) AS profit
FROM sales
GROUP BY YEAR,
country,
product
UNION ALL SELECT YEAR,
country,
NULL,
SUM(profit) AS profit
FROM sales
GROUP BY YEAR,
country
UNION ALL SELECT YEAR,
NULL,
NULL,
SUM(profit) AS profit
FROM sales
GROUP BY YEAR
UNION ALL SELECT NULL,
NULL,
NULL,
SUM(profit) AS profit
FROM sales) AS sum_table
ORDER BY YEAR, country, product;
+------+---------+------------+--------+
| YEAR | country | product | profit |
+------+---------+------------+--------+
| NULL | NULL | NULL | 7535 |
| 2000 | NULL | NULL | 4525 |
| 2000 | Finland | NULL | 1600 |
| 2000 | Finland | Computer | 1500 |
| 2000 | Finland | Phone | 100 |
| 2000 | India | NULL | 1350 |
| 2000 | India | Calculator | 150 |
| 2000 | India | Computer | 1200 |
| 2000 | USA | NULL | 1575 |
| 2000 | USA | Calculator | 75 |
| 2000 | USA | Computer | 1500 |
| 2001 | NULL | NULL | 3010 |
| 2001 | Finland | NULL | 10 |
| 2001 | Finland | Phone | 10 |
| 2001 | USA | NULL | 3000 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 2700 |
| 2001 | USA | TV | 250 |
+------+---------+------------+--------+
9 解析和设置GROUP BY/ORDER BY语句(setup_group/setup_order)
-
m_having_cond->fix_fields : 对having条件进行fixed_fields。
-
resolve_limits : 处理OFFSET和LIMIT子句(offset_limit和select_limit的Items)。
-
setup_ftfuncs : 如果有full-text的函数,对相关Item进行fix_fields。
-
remove_redundant_subquery_clause : 对于Table Subquery的表达式,通常是IN/ANY/ALL/EXISTS/etc,如果没有聚合函数和Having子句,通常可以考虑删除不必要的ORDER/DISTINCT/GROUP BY。该函数支持三种REMOVE_ORDER | REMOVE_DISTINCT | REMOVE_GROUP,如果是SINGLEROW_SUBS的子查询,只考虑删除REMOVE_ORDER。
select c1 from t1 where t1.c2 in (select distinct c1 from t2 group by c1, c2 order by c1);
转化为 =>
select c1 from t1 where t1.c2 in (select c1 from t2);
-
处理是否可以删除不必要的distinct语句,删除的条件就是GROUP BY的列都在SELECT列表中,并且没有ROLLUP和Window函数。
is_grouped() && hidden_group_field_count == 0 && olap == UNSPECIFIED_OLAP_TYPE
SELECT DISTINCT c1, max(c2) from t1 group by c1;
10 解析和设置Window函数(Window::setup_windows1)
SELECT id,
release_year,
rating,
avg(rating) over(PARTITION BY release_year) AS year_avg
FROM tw;
+------+--------------+--------+-------------------+
| id | release_year | rating | year_avg |
+------+--------------+--------+-------------------+
| 1 | 2015 | 8 | 8.5 |
| 3 | 2015 | 9 | 8.5 |
| 2 | 2015 | 8.5 | 8.5 |
| 4 | 2016 | 8.2 | 8.3 |
| 5 | 2016 | 8.4 | 8.3 |
| 6 | 2017 | 7 | 7 |
+------+--------------+--------+-------------------+
-
遍历window函数列表,调用resolve_window_ordering来解析m_partition_by和m_order_by
-
处理inter-window的引用关系(如WINDOW w1 AS (w2), w2 AS (), w3 AS (w1)),但必须是一个有向无环图(DAG)
-
重新遍历检查是否唯一名字check_unique_name、创建window partition by和window order by的引用items
-
检查窗口函数特征(Window::check_window_functions1(THD *thd, _block *select))
-
首先判断的是当前是静态窗口还是动态窗口;静态窗口即判断了 frame 的定义是否有定义上下边界。 m_static_aggregates 为 true, 意味着是静态窗口,同时对每一个分区都可以进行一次评估。如果 ma_static_aggregates 为 false, 则进一步判断其滑动窗口使用的是基于范围还是基于行。 m_row_optimizable 基于行 m_range_optimizable 基于范围
-
获取聚合函数作为窗口函数时候窗口的特殊规格要求 wfs->check_wf_semantics1(thd, select, &reqs) 这个方法其实就是判断是不是需要row_buffer作为评估,如果我们只看当前分区的行无法进行正确的计算不需要,而需要看之后的或者之前的行,就需要使用row_buffer。
三 综述
四 参考资料
-
《Mysql derived_MySQL · 新特性分析 · 5.7中Derived table变形记》
-
《ROLLUP性能增强》
-
《WL#9236, WL#9603 and WL#9727 - Add SQL window functions to MySQL》