MySQL索引扩展(Index Extensions)学习总结

MySQL InnoDB的二级索引（Secondary Index）会自动补齐主键，将主键列追加到二级索引列后面。详细一点来说，InnoDB的二级索引（Secondary Index）除了存储索引列key值，还存储着主键的值(而不是指向主键的指针)。为什么这样做呢？因为InnoDB是以聚集索引方式组织数据的存储，即主键值相邻的数据行紧凑的存储在一起（索引组织表）。当数据行移动或者发生页分裂的时候，可以减少大量的二级索引维护工作。InnoDB移动行时，无需更新二级索引。MySQL优化器通过使用扩展的辅助索引来进行更有效率地连接、排序、ref、range查询。我们以官方文档的例子来测试(下面测试环境为MySQL 5.6版本)：

CREATE TABLE t1 (
  i1 INT NOT NULL DEFAULT 0,
  i2 INT NOT NULL DEFAULT 0,
  d DATE DEFAULT NULL,
  PRIMARY KEY (i1, i2),
  INDEX k_d (d)
) ENGINE = InnoDB;

如上所示，这个t1表包含主键和二级索引k_d，二级索引k_d（d）的元组在InnoDB内部实际被扩展成（d,i1,i2），即包含主键值。因此在设计主键的时候，常见的一条设计原则是要求主键字段尽量简短，以避免二级索引过大(因为二级索引会自动补齐主键字段)。

MySQL优化器会考虑扩展二级索引的主键列，确定什么时候使用以及如何使用该索引。这样可以产生更高效的执行计划和达到更好的性能。有不少博客介绍索引扩展是从MySQL5.6.9开始引入的。不过个人还没有在官方文档看到相关资料。

优化器可以用扩展的二级索引来进行ref,range,index_merge等类型索引访问（index access）,松散的索引扫描（index sacns）,连接和排序优化，以及min()/max()优化。

我们先来插入测试数据:脚本来自官方文档^[1]：

INSERT INTO t1 VALUES
(1, 1, '1998-01-01'), (1, 2, '1999-01-01'),
(1, 3, '2000-01-01'), (1, 4, '2001-01-01'),
(1, 5, '2002-01-01'), (2, 1, '1998-01-01'),
(2, 2, '1999-01-01'), (2, 3, '2000-01-01'),
(2, 4, '2001-01-01'), (2, 5, '2002-01-01'),
(3, 1, '1998-01-01'), (3, 2, '1999-01-01'),
(3, 3, '2000-01-01'), (3, 4, '2001-01-01'),
(3, 5, '2002-01-01'), (4, 1, '1998-01-01'),
(4, 2, '1999-01-01'), (4, 3, '2000-01-01'),
(4, 4, '2001-01-01'), (4, 5, '2002-01-01'),
(5, 1, '1998-01-01'), (5, 2, '1999-01-01'),
(5, 3, '2000-01-01'), (5, 4, '2001-01-01'),
(5, 5, '2002-01-01');

#默认情况下，索引扩展（use_index_extensions）选项是开启的。可以在当前会话通过修改优化器开关optimizer_switch开启、关闭此选项。

mysql> show variables like '%optimizer_switch%';

mysql> SET optimizer_switch = 'use_index_extensions=off';
Query OK, 0 rows affected (0.00 sec)

mysql> EXPLAIN
    -> SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t1    | ref  | PRIMARY,k_d   | k_d  | 4       | const |    5 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

这种情况下，优化器不会使用主键，因为主键由字段（i1,i2）组成，但是该查询中没有引用i2字段;优化器会选择二级索引 k_d(d) 。先通过二级索引k_d(d)找到5条记录，然后使用条件i1=3去过滤数据。

我们将use_index_extensions选项在当前会话开启，那么SQL语句的执行计划会怎样变化呢?

mysql> SET optimizer_switch = 'use_index_extensions=on';
Query OK, 0 rows affected (0.00 sec)

mysql> EXPLAIN
    -> SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
+----+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref         | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
|  1 | SIMPLE      | t1    | ref  | PRIMARY,k_d   | k_d  | 8       | const,const |    1 | Using index |
+----+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
1 row in set (0.00 sec)

mysql>

当use_index_extensions=off的时候，仅使用索引k_d中d列的数据，忽略了扩展的主键列的数据。而use_index_extensions=on时，使用了k_d索引中（i1,i2,d）三列的数据。可以从上面两种情况下的explain输出结果中信息得以验证。

key_len：由4变到8，说明不仅仅使用了d列上的索引，而且使用了扩展的主键i1列的数据

ref：由const变为”const,const”, 使用了索引的两部分。

rows:从5变为1，表明InnoDB只需要检查更少的数据行就可以产生结果集。

Extra：”Using index,Using where” 变为”Using index”。通过索引覆盖就完成数据查询，而不需要读取任何的数据行。官方文档的介绍如下：

The Extra value changes from Using where; Using index to Using index. This means that rows can be read using only the index, without consulting columns in the data row.

在MySQL 5.7.x^[2]和MySQL 8^[3]下测试，发现SQL的执行计划略有区别，如下所示，关闭索引扩展会后，执行计划使用了主键索引PRIMARY，key_len 长度为4，Extra 使用了Using where，说明首先使用主键i1查询到符合条件的结果集，然后再使用where条件 “d = ‘2000-01-01’” 进行过滤。这个应该是优化器的算法策略导致。奇怪的是MySQL 5.7 & MySQL 8的官方文档居然还是跟MySQL 5.6的一样，猜测可能是没有及时更新文档。

mysql> SET optimizer_switch = 'use_index_extensions=off';
Query OK, 0 rows affected (0.00 sec)

mysql> explain
    -> SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | PRIMARY,k_d   | PRIMARY | 4       | const |    5 |    20.00 | Using where |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

mysql> SET optimizer_switch = 'use_index_extensions=on';
Query OK, 0 rows affected (0.00 sec)

mysql> explain
    -> SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | PRIMARY,k_d   | k_d  | 8       | const,const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql>

另外，从status信息中“Handler_read_%”相关状态值可以观察实际执行过程中索引和数据行的访问统计。

flush table  关闭已打开的数据表，并清除缓存（表缓存和查询缓存）。
flush status 把status计数器清零。

Handler_read_key：The number of requests to read a row based on a key. If this value is high, it is a good indication that your tables are properly indexed for your queries.

Handler_read_next：The number of requests to read the next row in key order. This value is incremented if you are querying an index column with a range constraint or if you are doing an index scan. 此选项表明在进行索引扫描时，按照索引从数据文件里取数据的次数。

关闭use_index_extensions情况下，status的统计信息

mysql> SET optimizer_switch = 'use_index_extensions=off';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH TABLE t1;
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> SHOW STATUS LIKE 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 5     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

开启use_index_extensions情况下，status的统计信息

mysql> SET optimizer_switch = 'use_index_extensions=on';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH TABLE t1;
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> SHOW STATUS LIKE 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 1     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

mysql>