引言

MySQL 中大表 truncate 有风险，线上环境遇到过 truncate 导致数据库 hang 进而连接打满的情况，本文介绍一个典型案例，truncate 直接导致数据库 crash。

其中基于源码分析信号量等待日志进而定位持锁与等锁的关系，并介绍了 truncate 的执行流程。

现象

时间：2023-05-17 09:40:45

现象：SQL工单执行失败

域名：mysql-cn-north.rds.jdcloud.com

版本：5.7.24

日志

EXECUTE START AT 2023-05-17 09:40:45
--------------
truncate information_accuracy_rate_abnormal
--------------
Query OK, 0 rows affected (0.10 sec)
--------------
truncate information_integrity_abnormal
--------------
Query OK, 0 rows affected (0.09 sec)
--------------
truncate information_timely_abnormal
--------------
Query OK, 0 rows affected (0.33 sec)
--------------
truncate line_timely_uncompleted
--------------
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2013 (HY000) at line 6: Lost connection to MySQL server during query
EXECUTE FAIL AT 2023-05-17 09:56:42

查看数据库，显示半小时前重启。

mysql> s
--------------
/export/servers/mysql/bin/mysql  Ver 14.14 Distrib 5.7.24, for Linux (x86_64) using  EditLine wrapper

Connection id:  1206
Current database: 
Current user:  admin@x.x.x.x
SSL:   Not in use
Current pager:  stdout
Using outfile:  ''
Using delimiter: ;
Server version:  5.7.24-log Source distribution
Protocol version: 10
Connection:  mysql-cn-north.rds.jdcloud.com via TCP/IP
Server characterset: utf8
Db     characterset: utf8
Client characterset: utf8
Conn.  characterset: utf8
TCP port:  3358
Uptime:   29 min 20 sec

Threads: 34  Questions: 19102  Slow queries: 0  Opens: 687  Flush tables: 1  Open tables: 680  Queries per second avg: 10.853

那么数据库重启的原因是什么呢？

分析

数据库重启时错误日志中会有记录，因此查看错误日志。

page_cleaner

08:42:04-08:43:13，刷脏慢

2023-05-17T08:42:04.063853+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4564ms. The settings might not be optimal. (flushed=3 and evicted=0, during the time.)
2023-05-17T08:42:12.056433+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6992ms. The settings might not be optimal. (flushed=3 and evicted=0, during the time.)
2023-05-17T08:43:13.204264+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 39382ms. The settings might not be optimal. (flushed=36 and evicted=0, during the time.)

A long semaphore wait

09:44:48 开始信号量等待。

2023-05-17T09:44:48.755390+08:00 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139851279259392 has waited at buf0flu.cc line 1209 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f348756e970 created in file buf0buf.cc line 1460
a writer (thread id 139864276502272) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0trunc.cc line 75
Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/row/row0trunc.cc line 1000
2023-05-17T09:44:48.755521+08:00 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139851208840960 has waited at srv0srv.cc line 1982 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f324f489d38 created in file dict0dict.cc line 1183
a writer (thread id 139864276502272) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/row/row0trunc.cc line 1843
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

其中：

139851279259392 线程执行到 buf0flu.cc:1209 申请 SX-lock on RW-latch 发生锁等待
139851208840960 线程执行到 srv0srv.cc:1982 申请 X-lock on RW-latch 发生锁等待
139864276502272 线程持有 RW-latch 锁

可见前两个线程都是被 139864276502272 线程阻塞，因此查看该线程。

事务信息中显示 139864276502272 线程在执行truncate line_timely_uncompleted语句。

---TRANSACTION 847727, ACTIVE 300 sec truncating table
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 22476970, OS thread handle 139864276502272, query id 467657174 11.165.93.130 db_auto System lock
truncate line_timely_uncompleted

到目前可以确认执行 truncate 语句持锁未释放，阻塞了后续操作。

接下来将详细分析以下三个问题：

执行 truncate 语句持有什么锁
阻塞了后续什么操作
阻塞的原因

另外，139851208840960 线程是主线程，状态异常 state: enforcing dict cache limit，表明主线程处于检查字典缓冲区内存使用情况及清理状态。

--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=28121, Main thread ID=139851208840960, state: enforcing dict cache limit
Number of rows inserted 635476595, updated 80131, deleted 552533418, read 1368627752
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 78.65 reads/s

因此，truncate 操作阻塞了主线程与其他线程。

此外，OS WAIT 如下所示。

OS WAIT ARRAY INFO: signal count 22505766
RW-shared spins 0, rounds 26778678, OS waits 5417193
RW-excl spins 0, rounds 312785650, OS waits 9185224
RW-sx spins 2571987, rounds 10416949, OS waits 47263
Spin rounds per wait: 26778678.00 RW-shared, 312785650.00 RW-excl, 4.05 RW-sx

crash

09:56:40 信号量等待超过 600s 后判定数据库 hung 后自动 crash。

2023-05-17T09:56:40.791164+08:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

Innodb 的 Diagnostic 线程检查到 RW-Latch 等待超过了600秒还没有返回，认为系统出现了严重问题，于是触发了 MySQL 服务的 Crash。

Assertion failure

2023-05-17 09:56:40 0x7f31a69ff700  InnoDB: Assertion failure in thread 139851225626368 in file ut0ut.cc line 942
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
01:56:40 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=147
max_threads=4020
thread_count=49
connection_count=48
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2128391 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/export/servers/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0xebb7ac]
/export/servers/mysql/bin/mysqld(handle_fatal_signal+0x451)[0x7aa0f1]
/lib64/libpthread.so.0(+0xf370)[0x7f34d925b370]
/lib64/libc.so.6(gsignal+0x37)[0x7f34d7e5a1d7]
/lib64/libc.so.6(abort+0x148)[0x7f34d7e5b8c8]
/export/servers/mysql/bin/mysqld[0x77a6c9]
/export/servers/mysql/bin/mysqld(_ZN2ib5fatalD1Ev+0x4d)[0x108437d]
/export/servers/mysql/bin/mysqld(srv_error_monitor_thread+0xa30)[0x1024da0]
/lib64/libpthread.so.0(+0x7dc5)[0x7f34d9253dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f34d7f1c76d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

signal 6 一般是实例的磁盘空间不足或者磁盘只读，MySQL 的数据发生了 data corruption，内核层面 latch 锁的长时间的锁等待造成。不过这里的 data corruption、长时间的锁等待可能是硬盘损坏或者 MySQL 的 Bug 造成的。

一般来说，触发 Crash 的原因分成两类：

MySQL进程自己触发了Crash（这里称之为MySQL自杀）；
MySQL进程被OS杀死。

具体 crash 的原因需要具体分析。

crash recovery

自动启动的完整日志如下所示。

2023-05-17T01:56:42.445488Z mysqld_safe Number of processes running now: 0
2023-05-17T01:56:42.452216Z mysqld_safe mysqld restarted
2023-05-17T09:56:42.626810+08:00 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2023-05-17T09:56:42.627068+08:00 0 [Warning] 'NO_AUTO_CREATE_USER' sql mode was not set.
2023-05-17T09:56:42.627680+08:00 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2023-05-17T09:56:42.627727+08:00 0 [Note] /export/servers/mysql/bin/mysqld (mysqld 5.7.24-log) starting as process 86341 ...
2023-05-17T09:56:42.636258+08:00 0 [Note] InnoDB: PUNCH HOLE support available
2023-05-17T09:56:42.636756+08:00 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2023-05-17T09:56:42.637162+08:00 0 [Note] InnoDB: Uses event mutexes
2023-05-17T09:56:42.637546+08:00 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2023-05-17T09:56:42.637948+08:00 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2023-05-17T09:56:42.638339+08:00 0 [Note] InnoDB: Using Linux native AIO
2023-05-17T09:56:42.639906+08:00 0 [Note] InnoDB: Number of pools: 1
2023-05-17T09:56:42.640616+08:00 0 [Note] InnoDB: Using CPU crc32 instructions
2023-05-17T09:56:42.642820+08:00 0 [Note] InnoDB: Initializing buffer pool, total size = 11G, instances = 8, chunk size = 128M
2023-05-17T09:56:43.137062+08:00 0 [Note] InnoDB: Completed initialization of buffer pool
2023-05-17T09:56:43.212169+08:00 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2023-05-17T09:56:43.230818+08:00 0 [Note] InnoDB: Highest supported file format is Barracuda.
2023-05-17T09:56:43.281221+08:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 314316701989
2023-05-17T09:56:43.281265+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 314316749972
2023-05-17T09:56:43.284860+08:00 0 [Note] InnoDB: Database was not shutdown normally!
2023-05-17T09:56:43.284877+08:00 0 [Note] InnoDB: Starting crash recovery.
2023-05-17T09:56:43.351764+08:00 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2023-05-17T09:56:43.351791+08:00 0 [Note] InnoDB: Trx id counter is 848128
2023-05-17T09:56:43.396400+08:00 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
2023-05-17T09:56:44.000883+08:00 0 [Note] InnoDB: Apply batch completed
2023-05-17T09:56:44.000924+08:00 0 [Note] InnoDB: Last MySQL binlog file position 0 92528, file name mysql-bin.256731
2023-05-17T09:56:44.101934+08:00 0 [Note] InnoDB: Rolling back trx with id 847727, 0 rows to undo
2023-05-17T09:56:44.102679+08:00 0 [Note] InnoDB: Rollback of trx with id 847727 completed
2023-05-17T09:56:44.122901+08:00 0 [Note] InnoDB: Completing truncate for table with id (70) residing in file-per-table tablespace with id (54)
2023-05-17T09:56:44.172249+08:00 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2023-05-17T09:56:44.172287+08:00 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2023-05-17T09:56:44.172301+08:00 0 [Note] InnoDB: Need to create a new innodb_temporary data file 'max'.
2023-05-17T09:56:44.172338+08:00 0 [Note] InnoDB: Setting file '/export/data/mysql/data/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2023-05-17T09:56:44.182453+08:00 0 [Note] InnoDB: File '/export/data/mysql/data/ibtmp1' size is now 12 MB.
2023-05-17T09:56:44.182523+08:00 0 [Note] InnoDB: Setting file '/export/data/mysql/data/max' size to 10240 MB. Physically writing the file full; Please wait ...
2023-05-17T09:56:44.182653+08:00 0 [Note] InnoDB: Progress in MB:
 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700 7800 7900 8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300 9400 9500 9600 9700 9800 9900 10000 10100 10200
2023-05-17T09:56:52.434082+08:00 0 [Note] InnoDB: File '/export/data/mysql/data/max' size is now 10240 MB.
2023-05-17T09:56:52.435514+08:00 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2023-05-17T09:56:52.435543+08:00 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2023-05-17T09:56:52.436881+08:00 0 [Note] InnoDB: Waiting for purge to start
2023-05-17T09:56:52.486994+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 9276ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
2023-05-17T09:56:52.486994+08:00 0 [Note] InnoDB: 5.7.24 started; log sequence number 314316749972
2023-05-17T09:56:52.488067+08:00 0 [Note] Plugin 'FEDERATED' is disabled.
2023-05-17T09:56:52.490374+08:00 0 [Note] InnoDB: Loading buffer pool(s) from /export/data/mysql/data/ib_buffer_pool
2023-05-17T09:56:52.492537+08:00 0 [Note] Recovering after a crash using mysql-bin
2023-05-17T09:56:52.492569+08:00 0 [Note] Starting crash recovery...
2023-05-17T09:56:52.492594+08:00 0 [Note] Crash recovery finished.
2023-05-17T09:56:52.498265+08:00 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
2023-05-17T09:56:52.498287+08:00 0 [Note] Server hostname (bind-address): '*'; port: 3358
2023-05-17T09:56:52.498334+08:00 0 [Note] IPv6 is available.
2023-05-17T09:56:52.498352+08:00 0 [Note]   - '::' resolves to '::';
2023-05-17T09:56:52.498409+08:00 0 [Note] Server socket created on IP: '::'.
2023-05-17T09:56:52.499998+08:00 0 [Note] InnoDB: Buffer pool(s) load completed at 230517  9:56:52
2023-05-17T09:56:52.503341+08:00 0 [Warning] 'user' entry 'root@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.503375+08:00 0 [Warning] 'user' entry 'mysql.session@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.503386+08:00 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.503513+08:00 0 [Warning] 'db' entry 'performance_schema mysql.session@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.503521+08:00 0 [Warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.503643+08:00 0 [Warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.508426+08:00 0 [Warning] 'tables_priv' entry 'user mysql.session@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.508446+08:00 0 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode.
2023-05-17T09:56:52.528011+08:00 0 [Note] Event Scheduler: Loaded 0 events
2023-05-17T09:56:52.528137+08:00 0 [Note] /export/servers/mysql/bin/mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/export/data/mysql/tmp/mysql.sock'  port: 3358  Source distribution
2023-05-17T09:57:40.231846+08:00 39 [Note] Got an error reading communication packets
2023-05-17T09:57:42.439163+08:00 43 [Note] Start binlog_dump to master_thread_id(43) slave_server(1683065848), pos(, 4)
2023-05-17T09:57:42.439219+08:00 43 [Note] Start asynchronous binlog_dump to slave (server_id: 1683065848), pos(, 4)
2023-05-17T09:57:42.442848+08:00 42 [Note] Start binlog_dump to master_thread_id(42) slave_server(1683031521), pos(, 4)
2023-05-17T09:57:42.442878+08:00 42 [Note] Start semi-sync binlog_dump to slave (server_id: 1683031521), pos(, 4)
2023-05-17T09:57:42.615008+08:00 8 [Note] Semi-sync replication initialized for transactions.
2023-05-17T09:57:42.615042+08:00 8 [Note] Semi-sync replication enabled on the master.
2023-05-17T09:57:42.615124+08:00 0 [Note] Starting ack receiver thread
2023-05-17T09:57:44.167561+08:00 49 [Note] Event Scheduler: scheduler thread started with id 49

日志显示数据库异常关机后崩溃恢复。

[Note] InnoDB: Log scan progressed past the checkpoint lsn 314316701989
[Note] InnoDB: Doing recovery: scanned up to log sequence number 314316749972
[Note] InnoDB: Database was not shutdown normally!
[Note] InnoDB: Starting crash recovery.

其中通过 LSN 判断数据库是否异常关闭。

正常关闭时 checkpoint_lsn 等于 flush_lsn，异常关闭时两者不等，则判定需要进行 crash recovery。其中：

checkpoint_lsn 等于 redo 中第一个 redo record 的 LSN；

flush_lsn 是数据库启动时打开系统表空间 ibdata 读取到的存储在其中的 LSN，表示上次 shutdown 时的 checkpoint 点，在后面做崩溃恢复时会用到。

日志显示某一个表在重启之前做过 truncate，而在重启的时候，需要把这个表 redo 一次 truncate，用于修复执行 truncate 期间数据库意外崩溃或非正常关闭可能导致的非系统表空间损坏。

2023-05-17T09:56:44.122901+08:00 0 [Note] InnoDB: Completing truncate for table with id (70) residing in file-per-table tablespace with id (54)

监控也可以印证这一点，数据库重启过程中内存使用率下降，数据量减小，表明数据库启动后执行 truncate 导致磁盘空间大量释放。

MySQL 大表 truncate 导致数据库 crash — 1688781692770

启动过程中调用 fixup_tables_in_non_system_tablespace 方法修复表空间，【实战经验分享】从一个案例说明MySQL源码的重要性中介绍了一个案例，数据库卡死后重启起不来，就与该方法有关。

堆栈显示 IO 线程与启动线程都处理 idle 状态，其中刷盘线程认为没有东西可以刷，所以处于 idle 状态，而启动线程认为有东西需要刷的，所以一直在等，信息没有对齐，所以死等了。原因是：

启动线程认为需要刷脏的原因是 innodb 启动的关键函数 innobase_start_or_create_for_mysql 调用 truncate_t::fixup_tables_in_non_system_tablespace 函数修复 truncate 导致的表空间损坏时会产生脏页，因此需要调用 log_make_checkpoint_at 函数刷脏。

刷盘线程认为不需要刷脏的原因是 innodb 是把启动过程中和启动之后正常运行所产生的脏页分阶段刷掉的，因此 innodb 恢复过程中在将 redo 重放产生的脏页刷到文件后，在数据库启动完成前，正常的刷盘还不能开始，需要等待，除非 buf_flush_event 事件被 set 一次。但是刷盘线程在 buf_flush_event 被 set 之前提前退出，而 InnoDB 认为 fixup 产生的脏页应该由启动之后的刷盘线程来刷掉。最终在 buf_flush_wait_flushed 函数处等待 buf_flush_event 被 set，而该函数是个死循环，因此导致启动线程死等。

该问题是 Percona 的 BUG。

原理

源码

依次分析两类信号量日志

第一类 139851279259392 线程等待。

2023-05-17T09:44:48.755390+08:00 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139851279259392 has waited at buf0flu.cc line 1209 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f348756e970 created in file buf0buf.cc line 1460
a writer (thread id 139864276502272) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0trunc.cc line 75
Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/row/row0trunc.cc line 1000

其中：

Thread 139851279259392 has waited at buf0flu.cc line 1209

buf0flu.cc:1209 处 rw_lock_sx_lock_gen 方法由 buf_flush_page 方法调用。buf_flush_page 是刷脏时调用的方法，用于将 buffer pool 中的脏页异步刷新到数据文件。刷脏申请 SX 锁，期间其他线程可以读，但是不允许写该 block。

/********************************************************************//**
Writes a flushable page asynchronously from the buffer pool to a file.
NOTE: in simulated aio we must call
os_aio_simulated_wake_handler_threads after we have posted a batch of
writes! NOTE: buf_pool->mutex and buf_page_get_mutex(bpage) must be
held upon entering this function, and they will be released by this
function if it returns true.
@return TRUE if the page was flushed */
ibool
buf_flush_page(...){
    ...
    if (flush) {

        mutex_exit(block_mutex);

        buf_pool_mutex_exit(buf_pool);

        // 加锁前，判断锁冲突
        if (flush_type == BUF_FLUSH_LIST
            && is_uncompressed
            && !rw_lock_sx_lock_nowait(rw_lock, BUF_IO_WRITE)) {

            if (!fsp_is_system_temporary(bpage->id.space())) {
                /* avoiding deadlock possibility involves
                doublewrite buffer, should flush it, because
                it might hold the another block->lock. */
                buf_dblwr_flush_buffered_writes();
            } else {
                buf_dblwr_sync_datafiles();
            }
   
            // 加SX锁
1209     rw_lock_sx_lock_gen(rw_lock, BUF_IO_WRITE);
        }
    ...
}

SX 锁是什么锁呢？

InnoDB 层有三种锁：S（共享锁）、X（排他锁）和 SX（共享排他锁）。

其中 SX 与 SX、X 是互斥锁。加 SX 不会影响读，只会阻塞写。

MySQL 5.7.2 版本中引入 SX，用于提高索引的读写效率，在此之前 InnoDB 的 btree 发生合并/分裂等可能修改 B-tree 的操作时，都需要对其加排他的索引锁，这时候是无法对该索引进行读写操作的。

SX-lock on RW-latch at 0x7f348756e970 created in file buf0buf.cc line 1460

buf0buf.cc:1460 处 rw_lock_create 方法由 buf_block_init 方法调用。buf_block_init 是数据库启动时初始化 buffer pool 过程中调用的方法，用于将数据保存在缓存中，并初始化用于同步访问缓存数据块 block 的锁，包括 rw_lock_create 创建的读写锁。

/********************************************************************//**
Initializes a buffer control block when the buf_pool is created. */
static
void
buf_block_init(...){
    ...
1460 rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING);
    ...
}

第二类 139851208840960 线程等待。

2023-05-17T09:44:48.755521+08:00 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139851208840960 has waited at srv0srv.cc line 1982 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f324f489d38 created in file dict0dict.cc line 1183
a writer (thread id 139864276502272) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/row/row0trunc.cc line 1843

其中：

Thread 139851208840960 has waited at srv0srv.cc line 1982

srv0srv.cc:1982 处 rw_lock_x_lock 方法由 srv_master_do_active_tasks 方法调用。srv_master_do_active_tasks 是主线程每隔 SRV_MASTER_DICT_LRU_INTERVAL 时间（47s）调用的方法，用于检查并淘汰保存 InnoDB 层表定义缓存的 LRU 链表。清理表缓存申请 dict_operation_lock 的 X 锁，期间其他线程不可读写。

// state: enforcing dict cache limit, cur_time % SRV_MASTER_DICT_LRU_INTERVAL == 0
- srv_master_do_active_tasks

// Make room in the table cache by evicting an unused table
-- srv_master_evict_from_table_cache

--- rw_lock_x_lock(dict_operation_lock)

与前文 MySQL semaphore wait 导致连接打满中是同一个方法，但是具体行数不同，原因是数据库小版本不一致，前文是 5.7.33，本文是 5.7.24。

X-lock on RW-latch at 0x7f324f489d38 created in file dict0dict.cc line 1183

dict0dict.cc:1183 处 rw_lock_create 方法由 dict_init 方法调用。dict_init 用于初始化据字典，其中存储有关数据库、表和列的元数据信息的系统表。这些元数据信息包括表结构、索引信息、约束信息等。

/**********************************************************************//**
Inits the data dictionary module. */
void
dict_init(void){
  ...
1183  rw_lock_create(dict_operation_lock_key,
      dict_operation_lock, SYNC_DICT_OPERATION);
  ...
}

因此，判断：

139851279259392 线程刷脏申请 SX 锁时发生锁等待，原因是 0x7f348756e970 线程持有初始化 buffer pool 时创建的 RW-latch；
139851208840960 线程清理表缓存申请 X 锁时发生锁等待，原因是 0x7f324f489d38 线程持有初始化 data dictionary 时创建的 RW-latch。

truncate 持有的锁是什么？

两类信号量等待日志结尾处的 Last time write locked 显示都与 truncate 操作有关，因此查看对应代码。

Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/row/row0trunc.cc line 1000

row0trunc.cc:1000 处 btr_pcur_restore_position 方法由 DropIndex 方法调用。

/**
Drop an index in the table.

@param mtr mini-transaction covering the read
@param pcur persistent cursor used for reading
@return DB_SUCCESS or error code */
dberr_t
DropIndex::operator()(mtr_t* mtr, btr_pcur_t* pcur) const
{
     rec_t* rec = btr_pcur_get_rec(pcur);
  bool freed = dict_drop_index_tree(rec, pcur, mtr);
     if (freed) {

            /* We will need to commit and restart the
            mini-transaction in order to avoid deadlocks.
            The dict_drop_index_tree() call has freed
            a page in this mini-transaction, and the rest
            of this loop could latch another index page.*/
            const mtr_log_t log_mode = mtr->get_log_mode();
            mtr_commit(mtr);

            mtr_start(mtr);
            mtr->set_log_mode(log_mode);

1000        btr_pcur_restore_position(BTR_MODIFY_LEAF, pcur, mtr);
        }
     ...
}

这里参考 chatgpt 查询 btr_pcur_restore_position 的作用，然后发现提问的方式相当重要。

btr_pcur_restore_position 方法用于恢复索引中游标的位置，truncate 过程中清空表中索引数据后，游标的位置将失效，因此需要调用 btr_pcur_restore_position 方法恢复游标的位置。

Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/row/row0trunc.cc line 1843

row0trunc.cc:1843 处 row_mysql_lock_data_dictionary 方法由 row_truncate_table_for_mysql 方法调用。row_mysql_lock_data_dictionary 方法用于锁定数据字典（data dictionary）。在执行 TRUNCATE TABLE 句时，需要锁定数据字典来确保对表的删除操作不会与其他需要访问数据字典的操作发生冲突，从而保证数据字典的一致性。

/**
Truncates a table for MySQL.
@param table  table being truncated
@param trx  transaction covering the truncate
@return error code or DB_SUCCESS */
dberr_t
row_truncate_table_for_mysql(
 dict_table_t* table,
 trx_t* trx)
{
        ...
        /* Step-3: Validate ownership of needed locks (Exclusive lock).
        Ownership will also ensure there is no active SQL queries, INSERT,
        SELECT, .....*/
        trx->op_info = "truncating table";
        ut_a(trx->dict_operation_lock_mode == 0);
1843    row_mysql_lock_data_dictionary(trx);
        ut_ad(mutex_own(&dict_sys->mutex));
        ut_ad(rw_lock_own(dict_operation_lock, RW_LOCK_X));
    ...
}

另外，1845 行调用 ut_ad(rw_lock_own(dict_operation_lock, RW_LOCK_X)) 获取到了 dict_operation_lock 的 X 锁。

其中显示 row_mysql_lock_data_dictionary 方法位于 Step-3，具体将在下文 truncate 流程中介绍。

因此判断执行 TRUNCATE TABLE 句时，需要锁定缓存池中的数据页与数据字典，因此导致异步线程刷脏与主线程淘汰字典缓存时锁等待。

根据网络资料，判断 truncate table 持锁时间长与 AHI 有关。

AHI

首先介绍下 AHI。

自适应哈希索引（innodb_adaptive_hash_index，AHI）是一个哈希表，key 是检索条件，value 是数据页。

InnoDB 存储引擎中索引基于 B+ 树实现，定位数据的 IO 次数取决于树的高度。为提高查询效率，引入自适应哈希索引，自动根据访问频率和模式给热点页建立哈希索引，而查询哈希索引的时间复杂度为 O(1)。

因此自适应哈希索引通过跳过逐层定位的步骤可以直接查询到某个检索条件对应的数据页。

对比下自适应哈希索引与查询缓存：

相同点是都是在缓存中保存数据提高查询效率，数据结构都是哈希表，key 都是检索条件；
不同点是自适应哈希索引的 value 是索引的数据页，查询缓存的 value 是查询结果。

AHI 的主要缺点：

仅适用于等值查询，不适用于其他查询类型，如范围查询无法使用哈希索引，因此需要关注 AHI 的命中率；
可能会加剧锁竞争，原因是凡是缓存都会涉及多个缓存消费者间的锁竞争。MySQL 通过设立多个 AHI 分区，每个分区使用独立的锁，来减少锁竞争；
低版本中 AHI 存在 bug。低版本是存在一些与 AHI 相关的影响业务的缺陷，在新版本上均已修复，新版本 MySQL 可放心使用。下面即将介绍其中一个 bug。

根据官方文档，在开启 AHI 时，执行 truncate 操作可能导致数据库性能下降，原因是删除自适应 hash 需要 scan buffer pool 的 LRU 链表。

MySQL 5.7.23 中针对 drop table 语句已经修复了这个问题，而 truncate table 未修复。

On a system with a large InnoDB buffer pool and innodb_adaptive_hash_index enabled, TRUNCATE TABLE operations may cause a temporary drop in system performance due to an LRU scan that occurs when removing an InnoDB table’s adaptive hash index entries. The problem was addressed for [DROP TABLE “13.1.29 DROP TABLE Statement”) in MySQL 5.5.23 (Bug #13704145, Bug #64284) but remains a known issue for TRUNCATE TABLE (Bug #68184).

为什么 drop table 修复而 truncate table 未修复呢？

原因是 truncate table 需要复用 space id，因此必须把 buffer pool 中的老的表中的页全部删除，而 drop table 因为新旧表的页可用通过 space id 区分，只需要把 flush list 中的脏页删除就可以了，也就是可以用 drop+create 代替 truncate 来解决大 buffer pool 夯的问题，很遗憾这个修改实际上是在 8.0 上做的。

那么，truncate table 时 AHI 状态如何呢？

查看 crash 前的错误日志中 INSERT BUFFER AND ADAPTIVE HASH INDEX 的信息。

-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 5550, seg size 5552, 15300623 merges
merged operations:
 insert 27899, delete mark 415437554, delete 3273
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 2921951, node heap has 1 buffer(s)
Hash table size 2921951, node heap has 381 buffer(s)
Hash table size 2921951, node heap has 201934 buffer(s)
Hash table size 2921951, node heap has 13 buffer(s)
Hash table size 2921951, node heap has 196 buffer(s)
Hash table size 2921951, node heap has 12 buffer(s)
Hash table size 2921951, node heap has 2 buffer(s)
Hash table size 2921951, node heap has 2 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s

其中：

delete mark 415437554，表明有大量数据页被标记为删除；
共 8 个AHI 分区，每个哈希表大小为 2921951，表示哈希表的槽数量。

监控显示数据库启动后 free_list 急剧增加，表明瞬间释放大量页。

实际上可以通过监控 Innodb_buffer_pool_pages_misc 参数来验证 AHI 信息。

监控显示 truncate table 期间 CPU 使用率升高到 50%。

GreatSQL删除分区慢的跟踪中介绍了另外一个 AHI 导致删除分区慢的案例。

perf 显示系统调用中有大量的自适应 hash 相关的调用。

原因是 drop 分区和 add 分区都会循环每个分区调用函数 btr_drop_ahi_for_table 、btr_search_drop_page_hash_index 清空所有分区及索引的的 AHI 信息。

关闭自适应 hash 后，相同的操作过程，删除第一个分区的时间明显变短，删除每个分区的时间基本上一致。

原因是删除第1个分区时 AHI 信息被清空，删第2个分区时 buffer 中已经没有 AHI 信息了，所有删除第2个分区就很快了。

truncate 流程

源码 row_truncate_table_for_mysql 方法中给出了 truncate 方法的执行流程。

/* Understanding the truncate flow.

Step-1: Perform intiial sanity check to ensure table can be truncated.
This would include check for tablespace discard status, ibd file
missing, etc ....

Step-2: Start transaction (only for non-temp table as temp-table don't
modify any data on disk doesn't need transaction object).

Step-3: Validate ownership of needed locks (Exclusive lock).
Ownership will also ensure there is no active SQL queries, INSERT,
SELECT, .....

Step-4: Stop all the background process associated with table.

Step-5: There are few foreign key related constraint under which
we can't truncate table (due to referential integrity unless it is
turned off). Ensure this condition is satisfied.

Step-6: Truncate operation can be rolled back in case of error
till some point. Associate rollback segment to record undo log.

Step-7: Generate new table-id.
Why we need new table-id ?
Purge and rollback case: we assign a new table id for the table.
Since purge and rollback look for the table based on the table id,
they see the table as 'dropped' and discard their operations.

Step-8: Log information about tablespace which includes
table and index information. If there is a crash in the next step
then during recovery we will attempt to fixup the operation.

Step-9: Drop all indexes (this include freeing of the pages
associated with them).

Step-10: Re-create new indexes.

Step-11: Update new table-id to in-memory cache (dictionary),
on-disk (INNODB_SYS_TABLES). INNODB_SYS_INDEXES also needs to
be updated to reflect updated root-page-no of new index created
and updated table-id.

Step-12: Cleanup Stage. Reset auto-inc value to 1.
Release all the locks.
Commit the transaction. Update trx operation state.

Notes:
- On error, log checkpoint is done followed writing of magic number to
truncate log file. If servers crashes after truncate, fix-up action
will not be applied.

- log checkpoint is done before starting truncate table to ensure
that previous REDO log entries are not applied if current truncate
crashes. Consider following use-case:
 - create table .... insert/load table .... truncate table (crash)
 - on restart table is restored .... truncate table (crash)
 - on restart (assuming default log checkpoint is not done) will have
   2 REDO log entries for same table. (Note 2 REDO log entries
   for different table is not an issue).
For system-tablespace we can't truncate the tablespace so we need
to initiate a local cleanup that involves dropping of indexes and
re-creating them. If we apply stale entry we might end-up issuing
drop on wrong indexes.

- Insert buffer: TRUNCATE TABLE is analogous to DROP TABLE,
so we do not have to remove insert buffer records, as the
insert buffer works at a low level. If a freed page is later
reallocated, the allocator will remove the ibuf entries for
it. When we prepare to truncate *.ibd files, we remove all entries
for the table in the insert buffer tree. This is not strictly
necessary, but we can free up some space in the system tablespace.

- Linear readahead and random readahead: we use the same
method as in 3) to discard ongoing operations. (This is only
relevant for TRUNCATE TABLE by TRUNCATE TABLESPACE.)
Ensure that the table will be dropped by trx_rollback_active() in
case of a crash.
*/

简单总结下 truncate table 的执行流程：

1、检查表空间状态，确认可以执行 truncate 操作；
2、开启事务（只针对非临时表，因为临时表不对磁盘上的数据进行任何修改，不需要事务对象）；
3、验证所需锁的拥有权（排他锁），确保没有活跃的 SQL 持有锁；
4、停止与表相关的全部后台进程；
5、检查外键约束，原因是 truncate 不支持带有外键的表；
6、异常回滚，并将回滚段 rollback segment 与 undo log 相关联；
7、生成新的 table-id；
8、记录关于表空间的信息，包括表和索引信息。如果在下一步中崩溃了，那么在恢复期间将尝试修复该操作；
9、删除表上的全部索引；
10、创建表上的全部索引；
11、将生成的 table-id 更新到表的元数据，包括内存的字典缓存与磁盘的 INNODB_SYS_TABLES（更新表 id）、INNODB_SYS_INDEXES（更新索引的根页号）；
12、清理阶段，将表的自增值重置为 1，释放所有的锁并提交事务。

其中：

2-12 期间 truncate 操作持锁未释放，因此 truncate 慢会导致后续操作锁等待；
8 中记录了原表的表空间信息，因此 crash recovery 过程中修复表空间并重新执行 truncate；
7 中生成了新的 table_id，但是上面也有提到 truncate 会复用 space id，table_id 是表级别的，space_id 是表空间级别的。

下面测试可以证明 truncate 前后 table_id 变化，而 space_id 不变。

mysql> select name,table_id,space from INFORMATION_SCHEMA.INNODB_SYS_TABLES where name='test_zk/t1';
+------------+----------+-------+
| name       | table_id | space |
+------------+----------+-------+
| test_zk/t1 |      193 |   179 |
+------------+----------+-------+
1 row in set (0.00 sec)

mysql> truncate table t1;
Query OK, 0 rows affected (0.01 sec)

mysql> select name,table_id,space from INFORMATION_SCHEMA.INNODB_SYS_TABLES where name='test_zk/t1';
+------------+----------+-------+
| name       | table_id | space |
+------------+----------+-------+
| test_zk/t1 |      205 |   179 |
+------------+----------+-------+
1 row in set (0.00 sec)

处理

执行

由于 truncate 工单未执行完成，因此重提工单再次执行。

执行前查看 AHI。

INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 5550, seg size 5552, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s

关闭 AHI。

mysql> show variables like '%adapt%';
+----------------------------------+--------+
| Variable_name                    | Value  |
+----------------------------------+--------+
| innodb_adaptive_flushing         | ON     |
| innodb_adaptive_flushing_lwm     | 10     |
| innodb_adaptive_hash_index       | ON     |
| innodb_adaptive_hash_index_parts | 8      |
| innodb_adaptive_max_sleep_delay  | 150000 |
+----------------------------------+--------+
5 rows in set (0.00 sec)

mysql> set global innodb_adaptive_hash_index=0;
Query OK, 0 rows affected (0.03 sec)

再次查看 AHI。

-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 5550, seg size 5552, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
Hash table size 2921951, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s

重新执行日志

EXECUTE START AT 2023-05-17 10:41:49
--------------
truncate information_accuracy_rate_abnormal
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
truncate information_integrity_abnormal
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
truncate information_timely_abnormal
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
truncate line_timely_uncompleted
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
truncate line_timely_analysis
--------------
Query OK, 0 rows affected (0.14 sec)
--------------
truncate pick_timely_rate_abnormal
--------------
Query OK, 0 rows affected (0.08 sec)
mysql: [Warning] Using a password on the command line interface can be insecure.
EXECUTE DONE AT 2023-05-17 10:41:50

其中：

之前执行到 line_timely_uncompleted 表导致数据库 crash，而这次执行该表仅用时 0.00 s，原因是 crash recovery 过程中已重新执行。

知识点

异常分析方法

在遇到数据库 hang 或 crash 时可以结合错误日志分析上下文，官方文档中也说了，95% 的 bug 都与特定的语句有关，因此需要重点参考 show engine innodb staus 中记录的事务信息。

About 95% of all bugs are related to a particular query. Normally, this is one of the last queries in the log file just before the server restarts.

比如信号量日志中显示 139851279259392 线程等锁，139864276502272 线程持锁。

--Thread 139851279259392 has waited at buf0flu.cc line 1209 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f348756e970 created in file buf0buf.cc line 1460
a writer (thread id 139864276502272) has reserved it in mode  exclusive

而事务信息中显示 139864276502272 线程在执行 truncate 语句，表明正是 truncate 阻塞了其他线程。

---TRANSACTION 847727, ACTIVE 300 sec truncating table
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 22476970, OS thread handle 139864276502272, query id 467657174 11.165.93.130 db_auto System lock
truncate line_timely_uncompleted

但是信号量日志中没有记录持锁线程执行到的进度，因此不便于分析持锁与等锁的依赖关系。

不过结合 Last time read locked 与 Last time write locked 有助于分析，显示执行到了 row0trunc.cc:1843。

Last time read locked in file row0purge.cc line 862
Last time write locked in file /export/zhg/mysql-5.7.24/storage/innobase/row/row0trunc.cc line 1843

源码中显示 row0trunc.cc:1843 处 row_mysql_lock_data_dictionary 方法由 row_truncate_table_for_mysql 方法调用。row_mysql_lock_data_dictionary 方法用于锁定数据字典（data dictionary）。

结论

SQL 工单中执行 truncate table 失败，报错 Lost connection to MySQL server during query，原因是数据库异常重启。

数据库异常关闭的原因是 truncate 导致信号量等待，而当信号量等待超过 600s 后判定数据库 hung 后自动 crash。

信号量等待的原因是执行 TRUNCATE TABLE 句时，需要锁定缓存池中的数据页与数据字典，因此导致异步线程刷脏与主线程淘汰字典缓存时锁等待。

truncate 需要长时间持锁的原因是在开启 AHI 时，执行 truncate 操作可能导致数据库性能下降，原因是删除自适应 hash 需要 scan buffer pool 的 LRU 链表。

drop 操作没有这个问题，原因是 truncate table 需要复用 space id，因此必须把 buffer pool 中的老的表中的页全部删除，而 drop table 因为新旧表的页可用通过 space id 区分，只需要把 flush list 中的脏页删除就可以了，也就是可以用 drop+create 代替 truncate 来解决大 buffer pool 夯的问题，很遗憾这个修改实际上是在 8.0 上做的。

开启 AHI 的原因是可以自动根据访问频率和模式给热点页建立哈希索引，其中保存检索条件与数据页的对应关系。基于缓存中的哈希表可以跳过逐层定位的步骤直接查询到某个检索条件对应的数据页从而提高热点页的查询效率。