引言

本文主要参考网易数据库团队分析 RC 级别下并发 insert 锁超时问题的思路，通过 GDB 调试结合源码分析 delete + insert 导致二级唯一索引申请两组 next-key lock 的原因。

现象

准备数据

数据库版本：5.7.24

事务隔离级别：RR

测试数据

CREATE TABLE `t_dupp` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `age` int(11) DEFAULT NULL,
  `name` varchar(10) DEFAULT NULL,
  `a` int(11) DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_age_name` (`age`,`name`)
) ENGINE=InnoDB CHARSET=utf8 ROW_FORMAT=Compact;

insert into t_dupp(age,name) values(1,'a'),(2,'b'),(3,'c'),(4,'d');

其中：

测试表有主键与两个字段组成的联合唯一键。

查看数据

mysql> select * from t_dupp;
+----+------+------+------+
| id | age  | name | a    |
+----+------+------+------+
|  1 |    1 | a    |    0 |
|  2 |    2 | b    |    0 |
|  3 |    3 | c    |    0 |
|  4 |    4 | d    |    0 |
+----+------+------+------+
4 rows in set (0.00 sec)

锁等待

两个事务分别执行 delete + insert 导致锁等待。

session 1	session 2
begin; delete from t_dupp where age=3 and name=’c’;
	begin; delete from t_dupp where age=2 and name=’b’;
insert into t_dupp value(3, 3, ‘c’, 0);
	insert into t_dupp value(2, 2, ‘b’, 0);
	Lock wait timeout exceeded

其中：

事务 1 delete + insert 第三行记录；
事务 2 delete + insert 第二行记录报错锁等待超时。

因此问题就是为什么操作第三行会影响到第二行，这两条记录理论上没有冲突。

两组 next-key lock

查看事务信息

---TRANSACTION 13227842, ACTIVE 22 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 2
MySQL thread id 3615843, OS thread handle 139677280687872, query id 63626932 localhost admin update
insert into t_dupp value(2, 2, 'b', 0)
------- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2158 page no 4 n bits 72 index uk_age_name of table `test_zk`.`t_dupp` trx id 13227842 lock mode S waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 1; hex 63; asc c;;
 2: len 4; hex 80000003; asc     ;;

------------------
TABLE LOCK table `test_zk`.`t_dupp` trx id 13227842 lock mode IX
RECORD LOCKS space id 2158 page no 4 n bits 72 index uk_age_name of table `test_zk`.`t_dupp` trx id 13227842 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 1; hex 62; asc b;;
 2: len 4; hex 80000002; asc     ;;

RECORD LOCKS space id 2158 page no 3 n bits 72 index PRIMARY of table `test_zk`.`t_dupp` trx id 13227842 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 4; hex 80000002; asc     ;;
 1: len 6; hex 000000c9d742; asc      B;;
 2: len 7; hex 5a0000003c02a1; asc Z   <  ;;
 3: len 4; hex 80000002; asc     ;;
 4: len 1; hex 62; asc b;;
 5: len 4; hex 80000000; asc     ;;

RECORD LOCKS space id 2158 page no 3 n bits 72 index PRIMARY of table `test_zk`.`t_dupp` trx id 13227842 lock mode S
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 4; hex 80000002; asc     ;;
 1: len 6; hex 000000c9d742; asc      B;;
 2: len 7; hex 5a0000003c02a1; asc Z   <  ;;
 3: len 4; hex 80000002; asc     ;;
 4: len 1; hex 62; asc b;;
 5: len 4; hex 80000000; asc     ;;

RECORD LOCKS space id 2158 page no 4 n bits 72 index uk_age_name of table `test_zk`.`t_dupp` trx id 13227842 lock mode S
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 1; hex 62; asc b;;
 2: len 4; hex 80000002; asc     ;;

RECORD LOCKS space id 2158 page no 4 n bits 72 index uk_age_name of table `test_zk`.`t_dupp` trx id 13227842 lock mode S waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 1; hex 63; asc c;;
 2: len 4; hex 80000003; asc     ;;

---TRANSACTION 13227841, ACTIVE 37 sec
5 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 2
MySQL thread id 3615842, OS thread handle 139677270038272, query id 63626931 localhost admin
TABLE LOCK table `test_zk`.`t_dupp` trx id 13227841 lock mode IX
RECORD LOCKS space id 2158 page no 4 n bits 72 index uk_age_name of table `test_zk`.`t_dupp` trx id 13227841 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 1; hex 63; asc c;;
 2: len 4; hex 80000003; asc     ;;

RECORD LOCKS space id 2158 page no 3 n bits 72 index PRIMARY of table `test_zk`.`t_dupp` trx id 13227841 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 6; hex 000000c9d741; asc      A;;
 2: len 7; hex 590000002a1264; asc Y   * d;;
 3: len 4; hex 80000003; asc     ;;
 4: len 1; hex 63; asc c;;
 5: len 4; hex 80000000; asc     ;;

RECORD LOCKS space id 2158 page no 3 n bits 72 index PRIMARY of table `test_zk`.`t_dupp` trx id 13227841 lock mode S
Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 6; hex 000000c9d741; asc      A;;
 2: len 7; hex 590000002a1264; asc Y   * d;;
 3: len 4; hex 80000003; asc     ;;
 4: len 1; hex 63; asc c;;
 5: len 4; hex 80000000; asc     ;;

RECORD LOCKS space id 2158 page no 4 n bits 72 index uk_age_name of table `test_zk`.`t_dupp` trx id 13227841 lock mode S
Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000003; asc     ;;
 1: len 1; hex 63; asc c;;
 2: len 4; hex 80000003; asc     ;;

Record lock, heap no 5 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000004; asc     ;;
 1: len 1; hex 64; asc d;;
 2: len 4; hex 80000004; asc     ;;

两个事务的加锁规则相同，以 delete + insert 第二行为例，持有以下 5 把锁。

索引	mode	gap_mode	data
二级唯一索引	S	record	(2, ‘b’, 2)
主键索引	X	record	2
主键索引	S	next-key	2
二级唯一索引	S	next-key	(2, ‘b’, 2)
二级唯一索引	S	next-key	(3, ‘c’, 3)

其中最诡异的就是 delete + insert 持有二级唯一索引的两组 next-key lock，这也是本文讨论的话题。

前置知识

隐式锁

insert 默认不加锁，也就是隐式锁，即使与其他事务唯一键冲突，也是将其他事务的隐式锁提升为显式 X 型 record lock，自己申请 S 型 next-key lock。

但是测试显示 delete + insert 申请两组 S 型 next-key lock，因此说明还有其他逻辑。

为了更明确的看到加锁流程，下文中使用 GDB 调试源码。

如下所示，主键与唯一键唯一性检查时都会调用lock_rec_convert_impl_to_expl函数将冲突记录的隐式锁转换为显式锁，然后给自己加锁。

dberr_t
lock_clust_rec_read_check_and_lock(){
  
  // 判断记录上是否存在隐式锁，如果存在则将其转换为显示锁
 if (heap_no != PAGE_HEAP_NO_SUPREMUM) {

  lock_rec_convert_impl_to_expl(block, rec, index, offsets);
 }

 // 加S锁, 如果上面的隐式锁转化成功, 此处加锁将会等待, 直到活跃事务释放锁
 err = lock_rec_lock(FALSE, mode | gap_mode, block, heap_no, index, thr);
  
}

dberr_t
lock_sec_rec_read_check_and_lock(){
  
  /* Some transaction may have an implicit x-lock on the record only
 if the max trx id for the page >= min trx id for the trx list or a
 database recovery is running. */

 if ((page_get_max_trx_id(block->frame) >= trx_rw_min_trx_id()
      || recv_recovery_is_on())
     && !page_rec_is_supremum(rec)) {

  lock_rec_convert_impl_to_expl(block, rec, index, offsets);
 }

 err = lock_rec_lock(FALSE, mode | gap_mode,
       block, heap_no, index, thr);
  
}

lock_rec_convert_impl_to_expl函数中判断该记录上的事务是否活跃，如果活跃，会帮该事务将隐式锁提升为显式锁。

/*********************************************************************//**
If a transaction has an implicit x-lock on a record, but no explicit x-lock
set on the record, sets one for it. */
static
void
lock_rec_convert_impl_to_expl(...){
  
  // 主键索引
 if (dict_index_is_clust(index)) {
  trx_id_t trx_id;

  trx_id = lock_clust_rec_some_has_impl(rec, index, offsets);
  // 判断事务是否是活跃状态
  trx = trx_rw_is_active(trx_id, NULL, true);
 } else { // 二级索引
        ...
 }

 if (trx != 0) {
  ulint heap_no = page_rec_get_heap_no(rec);

  ut_ad(trx_is_referenced(trx));

  /* If the transaction is still active and has no
  explicit x-lock set on the record, set one for it.
  trx cannot be committed until the ref count is zero. */
  // 把 rec 记录上的隐式锁转换为显式锁
  lock_rec_convert_impl_to_expl_for_trx(
   block, rec, index, offsets, trx, heap_no);
 }
}

Compact 行格式

InnoDB 存储引擎的 Compact 行格式见下图。

MySQL delete + insert 导致二级唯一索引申请两组 next-key lock 原因分析

其中：

一条完整的记录可以分为两部分，包括记录的额外信息和记录的真实数据；
记录的额外信息包括变长字段长度列表、NULL 值列表、记录头信息；
源码中rec_t指针对象指在记录的真实数据开始处，因此往后保存字段值，往前保存记录头信息。

记录头信息很重要，由固定的 5 字节组成，用于描述记录的一些属性。

5 字节也就是 40 个二进制位，具体二进制位代表的详细信息见下表。

名称	bit	说明
预留位	1	没有使用
预留位	1	没有使用
DELETE_MASK	1	标记该记录是否被删除，1 表示删除
MIN_RES_MASK	1	标记该记录是否为B+树的非叶子节点中的最小记录
N_OWNED	4	当该值为非0时，表示当前记录占用 page directory 里一个slot，并和前一个slot之间存在这么多个记录
HEAP_NO	13	索引堆中该条记录的排序记录 heap no
RECORD_TYPE	3	记录类型，000表示普通，0001表示B+树叶子节点指针，010表示 Infimum，011表示 Supremum，1xx表示保留
REC_NEXT	16	页中下一条记录和当前记录的相对位置偏移量，数据页内部通过链表串联各个行记录

记录头信息示意图如下所示。

其中：

记录的真实数据中还保存了三个系统列，包括主键 DB_ROW_ID、6 字节的事务 ID DB_TRX_ID、7 字节的回滚段指针 DB_ROLL_PTR。

GDB

断点

开启事务后，先执行 delete 语句，然后在执行 insert 语句前在函数lock_rec_lock设置断点。

两条语句如下所示，删除后指定相同主键插入。

delete from t_dupp where age=2 and name='b';

insert into t_dupp value(2, 2, 'b', 0);

由于删除的行记录对应唯一键 (2,b)，对应主键 2，因此下面简称该行为 r2b2。

主键唯一性检查

调用row_ins_duplicate_error_in_clust函数进行主键唯一性检查。

其中：

index=0x0000000150f0fd10，表示主键索引；
heap_no=3，表示 r2b2 在数据页中的 heap_no 等于 3；
rec=”x80″，表示 r2b2 的行记录首地址；
mode=LOCK_S，gap_mode=0，mode=2，表示主键冲突时加 S 型 next-key lock。

堆栈中显示lock_clust_rec_read_check_and_lock函数中包括数据行地址 rec，因此在该函数设置断点，查看行数据。

其中：

rec_t 的数据类型是是字节指针，可以理解为字节数组，用于保存行记录首地址；
从首地址往后查看 32 字节，其中前 4 字节是主键，之后 6 字节是事务 id，然后 7 字节是回滚段指针，再往后就是行中其他字段，显示 age=2，name=’b’，a=0；
从首地址往前查看 5 字节，其中第 5 字节的第 2 位表示该行是否已删除（DELETE_MASK），显示 1 表示该行已删除。也就是说唯一性检查查看是否有主键值为 2 的记录时找到了处于 delete-mark 状态的 r2b2。

GDB 命令：

p，表示打印变量数据；

x，表示查看内存变量；

xb，表示以十六进制格式查看。

插入主键

唯一性检查后插入主键值为 2 的记录。

其中：

插入主键时不是新增一条记录，而是直接在标记为 delete-mark 的旧记录上执行 update 操作，所以加锁时 heap_no=3；
mode=1027，原因是主键索引 update 操作加 X 型 record lock。

二级唯一键唯一性检查

调用row_ins_scan_sec_index_for_duplicate函数进行二级唯一键唯一性检查。

其中：

index=0x000000013411ed20，表示二级唯一索引；
heap_no=3，表示第二行记录 r2b2；
mode=LOCK_S，gap_mode=0，mode=2，表示二级唯一键冲突时加 S 型 next-key lock。

到这里都是正常的，下面就是见证奇迹的时刻。

其中：

heap_no=4，表示第三行记录 r3c3；
mode=LOCK_S，gap_mode=0，mode=2，表示二级唯一键冲突时给冲突记录的下一行也加 S 型 next-key lock。

插入二级唯一键

唯一性检查后插入唯一键值为 2b 的记录。

其中：

插入二级唯一键时也不是新增一条记录，而是直接在标记为 delete-mark 的旧记录上执行 update 操作，所以加锁时 heap_no=3；
mode=1027，原因是二级唯一索引 update 操作加 X 型 record lock。

源码

主键索引

row_ins_clust_index_entry_low函数中主键唯一性检查的函数是row_ins_duplicate_error_in_clust，关键代码如下所示。

   // 可重复读以下，加索引记录锁；可重复读以上，加Next-Key锁
   lock_type =
    trx->isolation_level <= TRX_ISO_READ_COMMITTED
    ? LOCK_REC_NOT_GAP : LOCK_ORDINARY;

   /* We set a lock on the possible duplicate: this
   is needed in logical logging of MySQL to make
   sure that in roll-forward we get the same duplicate
   errors as in original execution */

   if (flags & BTR_NO_LOCKING_FLAG) {
    /* Do nothing if no-locking is set */
    err = DB_SUCCESS;
   } else if (trx->duplicates) { // TRX_DUP_REPLACE

    /* If the SQL-query will update or replace
    duplicate key we will take X-lock for
    duplicates ( REPLACE, LOAD DATAFILE REPLACE,
    INSERT ON DUPLICATE KEY UPDATE). */
    // 对于RR，非insert主键冲突时加 X 型 next-key 锁
    err = row_ins_set_exclusive_rec_lock(
     lock_type,
     btr_cur_get_block(cursor),
     rec, cursor->index, offsets, thr);
   } else {
    // 对于RR，insert主键冲突时加显式锁，S 型 next-key 锁，可能会等待
    err = row_ins_set_shared_rec_lock(
     lock_type,
     btr_cur_get_block(cursor), rec,
     cursor->index, offsets, thr);
   }

其中：

对于 RR，insert on duplicate 检测到主键冲突时加 X 型 next-key lock；
对于 RR，insert 检测到主键冲突时加 S 型 next-key lock。

因此，前文中主键唯一性检查时加锁类型为 mode=LOCK_S，gap_mode=0。

唯一性检查通过后插入主键。

 /* Note: Allowing duplicates would qualify for modification of
 an existing record as the new entry is exactly same as old entry.
 Avoid this check if allow duplicates is enabled. */
 // 如果 rec_t 不是 delete mark，向上层返回 duplicate key 错误，
 // 如果是 delete mark，将插入流程转为 update-in-place 覆盖写入（row_ins_clust_index_entry_by_modify）。
 if (!index->allow_duplicates && row_ins_must_modify_rec(cursor)) {
  /* There is already an index entry with a long enough common
  prefix, we must convert the insert into a modify of an
  existing record */
  mem_heap_t* entry_heap = mem_heap_create(1024);

  /* If the existing record is being modified and the new record
  doesn't fit the provided slot then existing record is added
  to free list and new record is inserted. This also means
  cursor that we have cached for SELECT is now invalid. */
  if(index->last_sel_cur) {
   index->last_sel_cur->invalid = true;
  }

  err = row_ins_clust_index_entry_by_modify(
   &pcur, flags, mode, &offsets, &offsets_heap,
   entry_heap, entry, thr, &mtr);

  if (err == DB_SUCCESS && dict_index_is_online_ddl(index)) {
   row_log_table_insert(btr_cur_get_rec(cursor), entry,
          index, offsets);
  }

  mtr_commit(&mtr);
  mem_heap_free(entry_heap);
 }

其中：

如果冲突记录没有被标记为 delete-mark，向上层返回 duplicate key 错误；
如果冲突记录被标记为 delete-mark，会将插入转为更新覆盖写入 update-in-place，具体是调用row_ins_clust_index_entry_by_modify函数。

主键插入的主函数row_ins_clust_index_entry_low的函数描述中也有说明。

/***************************************************************//**
Tries to insert an entry into a clustered index, ignoring foreign key
constraints. If a record with the same unique key is found, the other
record is necessarily marked deleted by a committed transaction, or a
unique key violation error occurs. The delete marked record is then
updated to an existing record, and we must write an undo log record on
the delete marked record.
@retval DB_SUCCESS on success
@retval DB_LOCK_WAIT on lock wait when !(flags & BTR_NO_LOCKING_FLAG)
@retval DB_FAIL if retry with BTR_MODIFY_TREE is needed
@return error code */
dberr_t
row_ins_clust_index_entry_low(

其中：

主键索引插入时，如果冲突记录的唯一键与要插入的记录一样，而且该记录被标记为 delete-mark，将插入转为原地更新，并将旧记录写入 undo log。

二级唯一索引

row_ins_sec_index_entry_low函数中主键唯一性检查的函数是row_ins_scan_sec_index_for_duplicate，关键代码如下所示。

 // 返回时持有第一个满足条件记录所在的 page latch
 btr_pcur_open(index, entry, PAGE_CUR_GE,
        s_latch
        ? BTR_SEARCH_LEAF | BTR_ALREADY_S_LATCHED
        : BTR_SEARCH_LEAF,
        &pcur, mtr);

 allow_duplicates = thr_get_trx(thr)->duplicates; 

 /* Scan index records and check if there is a duplicate */
 do {
  const rec_t*  rec = btr_pcur_get_rec(&pcur);
  const buf_block_t* block = btr_pcur_get_block(&pcur);
  const ulint  lock_type = LOCK_ORDINARY;

  if (flags & BTR_NO_LOCKING_FLAG) {
   /* Set no locks when applying log
   in online table rebuild. */
  } else if (allow_duplicates) {

   /* If the SQL-query will update or replace
   duplicate key we will take X-lock for
   duplicates ( REPLACE, LOAD DATAFILE REPLACE,
   INSERT ON DUPLICATE KEY UPDATE). */

   // 添加 X 型锁
   err = row_ins_set_exclusive_rec_lock(
    lock_type, block, rec, index, offsets, thr);
  } else {
   
   // 普通 insert 添加 S 型锁
   err = row_ins_set_shared_rec_lock(
    lock_type, block, rec, index, offsets, thr);
  }
  // 扫描下一个记录，直到遇到第一个不同的记录
  } while (btr_pcur_move_to_next(&pcur, mtr));

其中：

无论哪个事务隔离级别，insert on duplicate 检测到二级唯一键冲突时加 X 型 next-key lock；
无论哪个事务隔离级别，insert 检测到二级唯一键冲突时加 S 型 next-key lock。

注意即使 RC 也要加 next-key lock，这是 RC 中为数不多的给记录添加间隙锁的场景。

加锁逻辑外部有一个 do-while 循环，可是理论上唯一键最多只会有一条冲突记录，为什么需要使用循环呢？下文中将回答。

使用循环时的关键在于循环的退出条件，注意循环中是先加锁后进行唯一键检查。

 do {
  const rec_t*  rec = btr_pcur_get_rec(&pcur);
  const buf_block_t* block = btr_pcur_get_block(&pcur);
  const ulint  lock_type = LOCK_ORDINARY;

    // 加锁
    if (allow_duplicates) {
   err = row_ins_set_exclusive_rec_lock(
    lock_type, block, rec, index, offsets, thr);
  } else {
   err = row_ins_set_shared_rec_lock(
    lock_type, block, rec, index, offsets, thr);
  }
  
  // 判断唯一键 key 是否重复，注意是先加锁后比较，cmp=0 表示重复
  cmp = cmp_dtuple_rec(entry, rec, offsets);

  // 唯一键重复，并且不允许重复
  if (cmp == 0 && !index->allow_duplicates) {
   
   // 判断冲突行是否已经删除，如果已经删除，则不冲突，返回 false, 如果不是删除的行，则冲突，返回 true
   if (row_ins_dupl_error_with_rec(rec, entry,
       index, offsets)) {
        
    // 报错唯一键冲突
    err = DB_DUPLICATE_KEY;
        
    goto end_scan;
   }
  } else {
   // ut_a 断言，表达式不为真时退出
   // 如果不重复或者重复但是允许重复时，执行下一行退出循环
   ut_a(cmp < 0 || index->allow_duplicates);
   goto end_scan;
  }
 // 扫描下一个记录，直到遇到第一个不同的记录
 } while (btr_pcur_move_to_next(&pcur, mtr));

其中有两个函数：

cmp_dtuple_rec函数用于判断唯一键是否重复，返回 cmp=0 表示重复；
row_ins_dupl_error_with_rec函数用于判断冲突行是否已经删除，返回 false 表示已删除因此不冲突，返回 true 表示未删除因此冲突。因此**row_ins_dupl_error_with_rec函数决定了加几个 next-key lock**。

row_ins_dupl_error_with_rec函数中依次判断索引的唯一性与冲突列是否删除。

 // 获取唯一索引定义包含的列个数
 n_unique = dict_index_get_n_unique(index);

 matched_fields = 0;

 cmp_dtuple_rec_with_match(entry, rec, offsets, &matched_fields);

 if (matched_fields < n_unique) {

  return(FALSE);
 }

 /* In a unique secondary index we allow equal key values if they
 contain SQL NULLs */

 if (!dict_index_is_clust(index) && !index->nulls_equal) {

  for (i = 0; i < n_unique; i++) {
   if (dfield_is_null(dtuple_get_nth_field(entry, i))) {

    return(FALSE);
   }
  }
 }
 
 // 判断冲突行是否已经删除
 return(!rec_get_deleted_flag(rec, rec_offs_comp(offsets)));

其中：

dict_index_get_n_unique函数用于获取唯一索引定义包含的列个数；
rec_get_deleted_flag函数用于判断冲突行是否已经删除。如果已经删除，则不冲突，返回 false, 如果不是删除的行，则冲突，返回 true；
如果唯一索引包含 NULL 字段，也认为不冲突，因此 MySQL 中二级唯一键允许多个 NULL。

因此，无论 delete 后 insert 时是否复用主键，row_ins_dupl_error_with_rec函数都会判断冲突行已删除，所以需要扫描下一行继续判断。

那么为什么冲突行被删除还需要扫描下一行？下文中将回答。

唯一性检查通过后插入二级唯一键。

row_ins_sec_index_entry_low函数中调用row_ins_must_modify_rec函数，返回 true 时同样调用row_ins_clust_index_entry_by_modify函数将插入转为原地更新。

 if (row_ins_must_modify_rec(&cursor)) {
  /* There is already an index entry with a long enough common
  prefix, we must convert the insert into a modify of an
  existing record */
  offsets = rec_get_offsets(
   btr_cur_get_rec(&cursor), index, offsets,
   ULINT_UNDEFINED, &offsets_heap);

  // 将 insert 转换成 update-in-place
  err = row_ins_sec_index_entry_by_modify(
   flags, mode, &cursor, &offsets,
   offsets_heap, heap, entry, thr, &mtr);

  if (err == DB_SUCCESS && dict_index_is_spatial(index)
      && rtr_info.mbr_adj) {
   err = rtr_ins_enlarge_mbr(&cursor, thr, &mtr);
  }
 }

row_ins_must_modify_rec函数用于判断是否可以复用 delete-mark 记录，其中调用dict_index_get_n_unique_in_tree函数。

row_ins_must_modify_rec(
/*====================*/
 const btr_cur_t* cursor) /*!< in: B-tree cursor */
{
 /* NOTE: (compare to the note in row_ins_duplicate_error_in_clust)
 Because node pointers on upper levels of the B-tree may match more
 to entry than to actual user records on the leaf level, we
 have to check if the candidate record is actually a user record.
 A clustered index node pointer contains index->n_unique first fields,
 and a secondary index node pointer contains all index fields. */

 return(cursor->low_match
        >= dict_index_get_n_unique_in_tree(cursor->index)
        && !page_rec_is_infimum(btr_cur_get_rec(cursor)));
}

dict_index_get_n_unique_in_tree函数用于获取唯一列，针对主键与二级唯一键调用不同的函数。

dict_index_get_n_unique_in_tree(
/*============================*/
 const dict_index_t* index) /*!< in: an internal representation
     of index (in the dictionary cache) */
{
 
 if (dict_index_is_clust(index)) {
  
  return(dict_index_get_n_unique(index));
 }

 return(dict_index_get_n_fields(index));
}

这两个函数分别获取索引对象 index 的 n_uniq 和 n_fields 字段。

 unsigned n_uniq:10;/*!< number of fields from the beginning
    which are enough to determine an index
    entry uniquely */
 unsigned n_fields:10;/*!< number of fields in the index */

其中：

主键索引获取 n_uniq 就是定义的主键定义包含的列个数；
二级唯一索引获取 n_felds 是索引定义列数加主键包含的列。

row_ins_must_modify_rec的函数描述中也有说明。

/***************************************************************//**
Checks if an index entry has long enough common prefix with an
existing record so that the intended insert of the entry must be
changed to a modify of the existing record. In the case of a clustered
index, the prefix must be n_unique fields long. In the case of a
secondary index, all fields must be equal.  InnoDB never updates
secondary index records in place, other than clearing or setting the
delete-mark flag. We could be able to update the non-unique fields
of a unique secondary index record by checking the cursor->up_match,
but we do not do so, because it could have some locking implications.
@return TRUE if the existing record should be updated; FALSE if not */
UNIV_INLINE
ibool
row_ins_must_modify_rec(

其中：

主键索引，将插入转为更新的前提是主键索引相同；
二级唯一索引，将插入转为更新的前提是二级唯一索引 + 主键索引相同。

函数描述中有句话很重要，二级索引从不直接原地更新，需要先删除后复用 delete-mark 的记录。

InnoDB never updates secondary index records in place, other than clearing or setting the delete-mark flag.

实际上，对于 update 操作，首先需要判断是否需要更新某个二级索引。只有当主键的 n_uniq 或唯一二级索引的 n_uniq 被修改时才需要更新二级索引，并且二级索引的更新操作会采用 delete+insert 的流程。

这部分逻辑本文就不介绍了，但是从中可以发现 delete 与 update 操作都可能产生 delete-mark 的记录，从而影响到后续的 insert。

主键索引允许直接原地更新，而二级索引不允许直接原地更新的原因与 MVCC 有关，下文中将解释原因。

对比

根据 GDB 与源码，主键与二级唯一键插入时都可以将插入转为原地更新从而复用 delete-mark 的记录，但是也有区别：

主键，当主键索引值相同时就可以原地更新；
二级唯一键，只有当二级唯一索引 + 主键索引值完全相同时才可以原地更新。

对于二级唯一键，唯一性检查用的是二级唯一索引，而判断是否可以原地更新用的是二级唯一索引 + 主键索引。

因此，理论上可能同时存在多个二级唯一索引相同但主键不同的 delete-mark 记录。所以为了保证唯一性，插入二级唯一键时需要给所有被 delete-mark 的 rec_t 以及第一个 sk 不相同的 rec_t 都加上 next-key lock。

引用阿里云的内核月报中提供的一个场景。

比如只包含 pk, sk 的一个 table。其中 pk 指主键，sk 指二级唯一键。

已经存在的二级索引记录 <1, 1>, <4, 2>, <10(delete-mark), 3>, <10(d), 8>, <10(d), 11>, <10(d), 21>, <15, 9> 需要插入二级索引<10, 6>, 那么就需要给<10, 3>, <10, 8>,<10,11>,<10,21>, <15, 9> 都加上next-key lock。

注意: 这里 <15, 9>也需要加上 next-key lock, 为的是保证像 <10, 100> 这样的 record 也不允许插入的. 但是如果这里 <15, 9> 是 <15000, 9> 那么这里被锁住的 gap 区间就非常非常大了。

因此二级唯一键唯一性检查时需要循环加锁，如果遇到相同唯一键的多个删除操作，将持续向后加锁。

那么，为什么插入主键不需要循环加锁呢？

原因是主键与二级唯一键判定 delete-mark 是否可以 update-in-place 的规则不同。

对于二级唯一键，如果二级唯一键相同但主键不同，二级唯一索引不可以 update-in-place delete-mark record，因此可能存在多条二级唯一键相同的记录；
对于主键，如果主键相同，主键索引就可以 update-in-place delete-mark record，因此最多存在一条主键相同的记录。

插入二级唯一键需要加 next-key lock 的原因是插入过程被分为两个阶段：

1）判断当前的物理记录上是否有冲突的 record（delete-mark 认为不冲突）；

2）如果没有冲突, 那么可以执行插入操作。

这里在阶段 1 和阶段2 之间必须有锁来保证（可以是 lock, 也可以是 latch），否则阶段 1 判断没有冲突可以插入的时候, 但是在阶段 1和阶段 2 之间另外一个事务插入了一个冲突的 record, 那么阶段 2 再插入的时候其实是冲突了。

所以当前的实现如果 gap 上存在至少一个相同的 record, 大概率是 delete-mark record, 那么需要给整个 range 都加上 gap X lock, 加了 gap X lock 以后就可以禁止其他事务在这个 gap 区间插入数据, 也就是通过 lock 来保证阶段 1 和阶段 2 的原子性。

如果 gap 上没有相同的 record, 那么就不需要进任何 gap lock。

实际上，在 delete + insert, insert … on duplicate key update, replace into 等场景中, 为了实现判断插入记录与现有物理记录是否冲突和插入记录这两个阶段的原子, unique check 的时候会给所有的相同的 record 和下一个 record 加上 next-key lock。导致后续 insert record 虽然没有冲突, 但是还是会被 Block 住, 进而有可能造成死锁的问题。

但是测试显示这里 insert … on duplicate key update 也加多个 next-key lock 的结论的成立有条件，具体将在下一篇文章中讲解。

知识点

MVCC

Compact 行格式中提到，InnoDB 中包括三个隐藏列：

DB_ROW_ID：行号，隐式主键，写入数据时自动维护的自增列；
DATA_TRX_ID：事务 ID，记录每一行最近一次修改它的事务 ID；
DATA_ROLL_PTR：指向当前数据 rollback_segment 的 undo log 记录，回滚数据就是通过这个指针；

需要注意的是隐藏列添加在主键索引上，二级索引上没有。

undo log 基于回滚指针（DATA_ROLL_PTR）维护数据行的所有版本，它指向当前数据行的上一个版本数据。

MVCC 基于 undo log 实现。

对于更新操作，每次对记录进行改动，都会记录一条 undo log，作为该记录的一个旧版本。

每条 undo log 都有一个 roll_pointer 属性，指向更早版本的 undo log。

随着更新次数的增加，所有的版本都会被 roll_pointer 属性连接成一个链表，称为版本链，版本链的头节点就是当前记录最新的值。

假设多个事务分别更新 name 字段，所有的版本都可以通过 MVCC 查到，通过非锁定读取解决读写冲突。

MVCC 的存在导致主键索引与二级索引更新操作实现方式的不同：

主键索引可以通过隐藏列 DB_ROLL_PTR 定位到历史版本，因此允许原地更新 updated-in-place；
二级索引无法通过隐藏列定位到历史版本，因此不允许原地更新，update 操作需要被转换成 delete + insert 操作。

因此，主键索引允许原地更新，而二级索引不允许原地更新。

具体对于主键索引，在 delete 之后又 insert 一个数据，会将该 record delete marked 标记改成 non-delete marked，然后记录一个 delete marked 的record 在 undo log 里面，这样如果有历史版本的查询，会通过 mvcc 从 undo log 中恢复该数据。因此不会出现相同的 delete mark record 跨多个 page 的情况。

delete-mark

InnoDB 中删除数据并不会将数据行从存储空间中移除，而是把该记录的 delete_mask 值设置为 1。

如果删除后再次将这条记录插入到表中，并不会因为新记录的插入而为它申请新的存储空间，而是直接复用原来被删除记录的存储空间。

个人理解这样实现的原因是可以通过空间复用减少碎片。

当然，如果插入其他数据，就不一定可以空间复用了。

具体是在插入操作入口函数page_cur_insert_rec_low中进行处理。

首先从 PAGE_FREE 链表中尝试获取足够的空间。仅仅比较链表头的一个记录，如果这个记录的空间大于需要插入的记录的空间，则复用这块空间（包括 heap_no），否则就从 PAGE_HEAP_TOP 分配空间。如果这两个地方都没有，则返回空。这里注意一下，由于只判断 Free 链表的第一个头元素，所以算法对空间的利用率不是很高，估计也是为了操作方便。

假设，某个数据页首先删除了几条大的记录，但是最后一条删除的是比较小的记录 A，那么后续插入的记录大小只有比记录 A 还小，才能把 Free 链表利用起来。

举个例子，假设先后删除记录的大小为 4K, 3K, 5K, 2K，那么只有当插入的记录小于 2K 时候，这些被删除的空间才会被利用起来，假设新插入的记录是 0.5K，那么 Free 链表头的 2K，可以被重用，但是只是用了前面的 0.5K，剩下的 1.5K 依然会被浪费，下次插入只能利用 5K 记录所占的空间，并不会把剩下的 1.5K 也利用起来。

这些特性，从底层解释了，为什么 InnoDB 那么容易产生碎片，经常需要进行空间整理。

结论

insert 检测到主键冲突时：

加锁类型，对于 RR，加 S 型 next-key lock，对于 RC，加 S 型 record lock；
加锁范围，给冲突记录加锁。

insert 检测到二级唯一键冲突时：

加锁类型，加 S 型 next-key lock，加锁类型与事务隔离级别无关，因此包括 RC；
加锁范围，给所有的相同的 record（delete-mark record）和下一个 record 加锁，与是否更新主键无关。

主键与二级唯一键插入时都可以将插入转为更新覆盖写入 update-in-place 从而复用 delete-mark 的记录，但是也有区别：

主键，当主键索引值相同时就可以原地更新；
二级唯一键，只有当二级唯一索引 + 主键索引值完全相同时才可以原地更新。

对于二级唯一键，唯一性检查用的是二级唯一索引，而判断是否可以原地更新用的是二级唯一索引 + 主键索引。

二级唯一键唯一性检查需要循环加锁，而主键唯一性检查不需要循环加锁的原因是：

对于二级唯一键，如果二级唯一键相同但主键不同，二级唯一索引不可以 update-in-place delete-mark record，因此可能存在多条二级唯一键相同的记录；
对于主键，如果主键相同，主键索引就可以 update-in-place delete-mark record，因此最多存在一条主键相同的记录。

在 RC 隔离级别下，要让 insert 操作达到完全的并发执行，需要有两个前提条件：

1）所操作的表没有唯一索引；

2）若存在唯一索引，插入的数据不会引发唯一性冲突。（补充说明下，这里的唯一性索引不包含聚集索引）

对于 insert 操作来说，若发生唯一约束冲突，则需要对冲突的唯一索引加 next-key 共享读锁。而且还需要对该唯一索引的下一条记录也加 next-key 共享读锁。