MySQL:主从中的slave_max_allowed_packet和max_allowed_packet

简单记录 5.7.22，5.6.25也是如此

一、主从中slave_max_allowed_packet的说明

控制SQL线程能够读取的event的最大大小，默认为1G，不要进行调整这个参数。

  if (data_len > max_size)
  {
    error = "Event too big";
    goto err;
  }
 sql_print_error("Error in Log_event::read_log_event(): "
                    "'%s', data_len: %lu, event_type: %d",
      error,data_len,head[EVENT_TYPE_OFFSET]);

主从报错会出现：

Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: 
the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the
 slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network 
problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or 
slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

日志会出现：

2021-10-18T07:21:17.431850Z 24 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 16034, event_type: 30
2021-10-18T07:21:17.431896Z 24 [ERROR] Error reading relay log event for channel '': slave SQL thread aborted because of I/O error
2021-10-18T07:21:17.431919Z 24 [ERROR] Slave SQL for channel '': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
2021-10-18T07:21:17.431931Z 24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'log_bin.000002' position 880125.

二、主从中max_allowed_packet的说明

主要控制主库DUMP线程每次读取event的最大大小，但是DUMP线程做了修改每次会修改自身的max_allowed_packet为1G，不会依赖设置的参数max_allowed_packet。

Binlog_sender::init
  /* Binary event can be vary large. So set it to max allowed packet. */
  thd->variables.max_allowed_packet= MAX_MAX_ALLOWED_PACKET;

其中MAX_MAX_ALLOWED_PACKET就是一个固定值为1024*1024*1024（1G），如果超过了1G的大小则会报错LOG_READ_TOO_LARGE。

 data_len= uint4korr(buf + EVENT_LEN_OFFSET);
  if (data_len < LOG_EVENT_MINIMAL_HEADER_LEN ||
      data_len > max(current_thd->variables.max_allowed_packet,//这个值固定为1G
                     opt_binlog_rows_event_max_size + MAX_LOG_EVENT_HEADER))
  {
    DBUG_PRINT("error",("data_len is out of bounds. data_len: %lu", data_len));
    result= ((data_len < LOG_EVENT_MINIMAL_HEADER_LEN) ? LOG_READ_BOGUS :
      LOG_READ_TOO_LARGE);
    goto end;
  }

报错的时候如下：

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry 
exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event '' at 4, the last
event read from '/opt/bin/log_bin.000002' at 1075335, the last byte read from '/opt/bin/log_bin.000002' at
1075354.'

显然event大于1G还是很难的，因为1个event默认是8K左右，如果超过了就会切换event。因此基本就是一行记录大于1G，才可能event的大小大于1G。这个时候可以检查下报错中的the last event read from ‘/opt/bin/log_bin.000002’ at 1075335的位置，event大小是否超过1G大小。这个位置来自dump线程每次读取一个event后就会进行更新如下：

  if ((error= Log_event::read_log_event(log_cache, &m_packet, m_fdle.get(), NULL, checksum_alg,
                                        NULL, NULL, header)))
    goto read_error;

  set_last_pos(my_b_tell(log_cache)); //修改这个值

可以看到这是精确的dump线程读取event的位置。

三、关于错误的指定pos

如果错误指定了pos（change master 错误的指定了master_log_file/master_log_pos），则可能导致读取到的data_len过大，同样会导致问题。

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry 
exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event '' at 4, the last
event read from '/opt/bin/log_bin.000002' at 1075335, the last byte read from '/opt/bin/log_bin.000002' at
1075354.'

这是因为一旦指定错误pos，那么读取event header将会读错，那么根据偏移量读取的event的长度将会错误，这会导致获取根据偏移量EVENT_LEN_OFFSET获取data_len得到错误的数据。下面是读取event header的函数。

  inline static bool peek_event_header(char *header, IO_CACHE *log_cache)
  {
    DBUG_ENTER("Log_event::peek_event_header");
    if (my_b_read(log_cache, (uchar*) header, LOG_EVENT_MINIMAL_HEADER_LEN))
      DBUG_RETURN(true);
    DBUG_RETURN(false);
  }

四、metalink关于exceeded max_allowed_packet的说明

也就是这个错误：

Got fatal error 1236 from master when reading data from binary log: 'log event entry 
exceeded max_allowed_packet; Increase max_allowed_packet on master;

It is possible that the master simply recorded a statement that was larger than max_allowed_packet 
permitted it to read. However, this is only likely if the connection that created the entry deliberately set a 
larger value, and would be easily solved by simply setting the global max_allowed_packet variable larger
 on the master. On 5.5.26 and later, ensure that the slave_max_allowed_packet variable is large enough.

More likely, the slave has incorrectly received some part of the binary log, and has requested data from 
the incorrect position. If the position does not accurately match the start of a real event in the binary log, it
 will receive a value for the packet length that is essentially nonsense data.

If the following solution does not solve the problem, then the master's error log may be corrupt, 
particularly if it exceeds 4GB in size.