mysql之从服务器状态变量角度看max、min优化
实验
有一张测试表user_test,表结构如下:
mysql> desc user_test;
+-------------+-------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+-------------------+-----------------------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| login_name | varchar(30) | NO | | NULL | |
| email | varchar(50) | YES | | | |
| phonenumber | varchar(11) | YES | | | |
| sex | char(1) | YES | | 0 | |
| password | varchar(50) | YES | | | |
| create_time | datetime | YES | MUL | CURRENT_TIMESTAMP | |
| update_time | datetime | YES | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------+-------------+------+-----+-------------------+-----------------------------+
表中索引如下
mysql> show index from user_test;
+-----------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible |
+-----------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| user_test | 0 | PRIMARY | 1 | id | A | 992906 | NULL | NULL | | BTREE | | | YES |
| user_test | 1 | idx_create_time | 1 | create_time | A | 992906 | NULL | NULL | YES | BTREE | | | YES |
+-----------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
2 rows in set (0.00 sec)
表中数据量100w。
mysql> select count(*) from user_test;;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.17 sec)
接下来我从Innodb_rows_read这个MySQL服务器维护的状态变量来比较下如下两个语句的资源消耗。sql1:
select max(id) as max_id from user_test where create_time < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 7 DAY), "%Y-%m-%d") and sex=0;
sql2:
select id as max_id from user_test where create_time < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 7 DAY), "%Y-%m-%d") and sex=0 order by id desc limit 1;
我们知道sql1和sql2这两个sql的作用是一样的,取满足where条件的max id。注意取的是max id,不是max(create_time),而且还有附加条件status=0。
来分别看下这两个sql所带来的Innodb_rows_read增量。
sql1:
mysql> SET optimizer_trace='enabled=on';
Query OK, 0 rows affected (0.00 sec)
mysql> select VARIABLE_VALUE into @a from performance_schema.session_status where variable_name = 'Innodb_rows_read';
Query OK, 1 row affected (0.00 sec)
mysql> select max(id) as max_id from user_test where create_time < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 7 DAY), "%Y-%m-%d") and sex=0;
+--------+
| max_id |
+--------+
| 500000 |
+--------+
1 row in set (0.38 sec)
mysql> select VARIABLE_VALUE into @b from performance_schema.session_status where variable_name = 'Innodb_rows_read';
Query OK, 1 row affected (0.00 sec)
mysql> select @b-@a;
+---------+
| @b-@a |
+---------+
| 1000293 |
+---------+
1 row in set (0.00 sec)
sql2:
mysql> SET optimizer_trace='enabled=on';
Query OK, 0 rows affected (0.00 sec)
mysql> select id as max_id from user_test where create_time < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 7 DAY), "%Y-%m-%d") and sex=0 order by id limit 1;
+--------+
| max_id |
+--------+
| 7 |
+--------+
1 row in set (0.00 sec)
mysql> SET optimizer_trace='enabled=on';
Query OK, 0 rows affected (0.00 sec)
mysql> select VARIABLE_VALUE into @a from performance_schema.session_status where variable_name = 'Innodb_rows_read';
Query OK, 1 row affected (0.00 sec)
mysql> select id as max_id from user_test where create_time < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 7 DAY), "%Y-%m-%d") and sex=0 order by id desc limit 1;
+--------+
| max_id |
+--------+
| 500000 |
+--------+
1 row in set (0.20 sec)
mysql> select VARIABLE_VALUE into @b from performance_schema.session_status where variable_name = 'Innodb_rows_read';
Query OK, 1 row affected (0.00 sec)
mysql> select @b-@a;
+--------+
| @b-@a |
+--------+
| 500265 |
+--------+
1 row in set (0.00 sec)
可以看到使用了“max(id)“的sql1的Innodb_rows_read的增量为1000293,差不多是user_test表的大小,所以我们可以说max函数会造成全表扫描。使用了“order by id desc limit 1”的sql2,它的Innodb_rows_read的增量为500265,基本上是“create_time < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 7 DAY), "%Y-%m-%d")”所过滤的行数,即索引扫描行数。sql2相比sql1,性能消耗要低得多。当表中数据很大时,这个差值将更明显。
Innodb_rows_read:
官网解释:The number of rows read from InnoDB tables. 它是一种累加的算法,sql每扫描一行,该值就累加1。
Optimizer Trace 是MySQL 5.6.3里新加的一个特性,可以把MySQL Optimizer的决策和执行过程输出成文本,结果为JSON格式,兼顾了程序分析和阅读的便利。
1.performance_schema.session_status,保存当前会话的状态变量,可以利用它来统计innodb读取行数。
2.可以利用performance_schema库里面的optimizer_trace来查看语句执行的详细信息。语句:select trace frominformation_schema
.optimizer_trace
\G;
结论
针对需要使用max或者min函数的场景,可以考虑替换成order by id desc/acs limit 1。
参考:
https://mini.nidbox.com/diary/read/9903065
https://www.linuxidc.com/Linux/2018-12/156067.htm