vlambda博客
学习文章列表

项目场景:一次线上MySQL死锁告警原因排查

最近处理了一次线上数据告警,记录一下。


问题描述

同步书架书籍的接口频繁抛出异常,提示数据库出现死锁,异常如下:

本日异常次数:2,异常日志:java.lang.RuntimeException: org.springframework.dao.DeadlockLoserDataAccessException: 
### Error updating database. Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction ### The error may involve com.iflytek.hmreader.order.dao.UBookShelfMapper.batchInsertOrUpdate-Inline
### The error occurred while setting parameters
### SQL: insert into `um_user_bookshelf` (`app_id`, `channel_id`, `version_id`, `user_id`, `book_id`, `chapter_id`, `progress`,`listened`, `from_user_id`, `create_time`,`update_time`,`sync_time`) values (?,?,?,?,?,?,?,?,?,NOW(),?,?) , (?,?,?,?,?,?,?,?,?,NOW(),?,?) , (?,?,?,?,?,?,?,?,?,NOW(),?,?) , (?,?,?,?,?,?,?,?,?,NOW(),?,?) on duplicate key update chapter_id = values(chapter_id), progress = values(progress),listened = values(listened),update_time = values(update_time)
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction ; SQL []; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
org.springframework.dao.DeadlockLoserDataAccessException:
### Error updating database. Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction ### The error may involve com.iflytek.hmreader.order.dao.UBookShelfMapper.batchInsertOrUpdate-Inline
### The error occurred while setting parameters
### SQL: insert into `um_user_bookshelf` (`app_id`, `channel_id`, `version_id`, `user_id`, `book_id`, `chapter_id`, `progress`,`listened`, `from_user_id`, `create_time`,`update_time`,`sync_time`) values (?,?,?,?,?,?,?,?,?,NOW(),?,?) , (?,?,?,?,?,?,?,?,?,NOW(),?,?) , (?,?,?,?,?,?,?,?,?,NOW(),?,?) , (?,?,?,?,?,?,?,?,?,NOW(),?,?) on duplicate key update chapter_id = values(chapter_id), progress = values(progress),listened = values(listened),update_time = values(update_time)
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction ; SQL []; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:263)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:75)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:447)
at com.sun.proxy.$Proxy70.insert(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:279)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:56)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53)
at com.sun.proxy.$Proxy115.batchInsertOrUpdate(Unknown Source)
at com.iflytek.hmreader.order.service.UmBookShelfServiceImpl.updBookShelf(UmBookShelfServiceImpl.java:134)
at com.alibaba.dubbo.common.bytecode.Wrapper10.invokeMethod(Wrapper10.java)
at com.alibaba.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:46)
at com.alibaba.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:72)
at com.alibaba.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:53)
at com.alibaba.dubbo.rpc.filter.ExceptionFilter.invoke(ExceptionFilter.java:64)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:75)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.TimeoutFilter.invoke(TimeoutFilter.java:42)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.protocol.dubbo.filter.TraceFilter.invoke(TraceFilter.java:78)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.ContextFilter.invoke(ContextFilter.java:70)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.GenericFilter.invoke(GenericFilter.java:132)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.ClassLoaderFilter.invoke(ClassLoaderFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.EchoFilter.invoke(EchoFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.protocol.dubbo.DubboProtocol$1.reply(DubboProtocol.java:113)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.handleRequest(HeaderExchangeHandler.java:84)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:170)
at com.alibaba.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:52)
at com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3976)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3912)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2486)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1197)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:498)
at io.shardingsphere.shardingjdbc.executor.PreparedStatementExecutor$4.executeSQL(PreparedStatementExecutor.java:158)
at io.shardingsphere.shardingjdbc.executor.PreparedStatementExecutor$4.executeSQL(PreparedStatementExecutor.java:154)
at io.shardingsphere.core.executor.sql.execute.SQLExecuteCallback.execute0(SQLExecuteCallback.java:85)
at io.shardingsphere.core.executor.sql.execute.SQLExecuteCallback.execute(SQLExecuteCallback.java:69)
at io.shardingsphere.core.executor.ShardingExecuteEngine.syncGroupExecute(ShardingExecuteEngine.java:182)
at io.shardingsphere.core.executor.ShardingExecuteEngine.groupExecute(ShardingExecuteEngine.java:158)
at io.shardingsphere.core.executor.sql.execute.SQLExecuteTemplate.executeGroup(SQLExecuteTemplate.java:71)
at io.shardingsphere.core.executor.sql.execute.SQLExecuteTemplate.executeGroup(SQLExecuteTemplate.java:54)
at io.shardingsphere.shardingjdbc.executor.AbstractStatementExecutor.executeCallback(AbstractStatementExecutor.java:122)
at io.shardingsphere.shardingjdbc.executor.PreparedStatementExecutor.execute(PreparedStatementExecutor.java:161)
at io.shardingsphere.shardingjdbc.jdbc.core.statement.ShardingPreparedStatement.execute(ShardingPreparedStatement.java:139)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.update(PreparedStatementHandler.java:46)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.update(RoutingStatementHandler.java:74)
at org.apache.ibatis.executor.SimpleExecutor.doUpdate(SimpleExecutor.java:50)
at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
at sun.reflect.GeneratedMethodAccessor210.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
at com.sun.proxy.$Proxy116.update(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:198)
at org.apache.ibatis.session.defaults.DefaultSqlSession.insert(DefaultSqlSession.java:185)
at sun.reflect.GeneratedMethodAccessor209.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:434)
... 34 more

at com.alibaba.dubbo.rpc.filter.ExceptionFilter.invoke(ExceptionFilter.java:108)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:75)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.TimeoutFilter.invoke(TimeoutFilter.java:42)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.protocol.dubbo.filter.TraceFilter.invoke(TraceFilter.java:78)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.ContextFilter.invoke(ContextFilter.java:70)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.GenericFilter.invoke(GenericFilter.java:132)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.ClassLoaderFilter.invoke(ClassLoaderFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.filter.EchoFilter.invoke(EchoFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:91)
at com.alibaba.dubbo.rpc.protocol.dubbo.DubboProtocol$1.reply(DubboProtocol.java:113)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.handleRequest(HeaderExchangeHandler.java:84)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:170)
at com.alibaba.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:52)
at com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


原因分析:

提示:这里填写问题的分析:

我们采用的是5.7.17版本的mysql数据库,事务隔离级别是默认的RR(Repeatable-Read),采用innodb引擎。其中涉及的表为

CREATE TABLE `um_user_bookshelf_62` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
`app_id` varchar(50) NOT NULL DEFAULT '43' COMMENT '产品ID',
`channel_id` varchar(50) NOT NULL DEFAULT '0' COMMENT '渠道ID',
`version_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '版本ID',
`user_id` varchar(32) DEFAULT NULL,
`book_id` bigint(20) NOT NULL COMMENT '书籍ID',
`chapter_id` bigint(20) DEFAULT NULL COMMENT ' 章节编号',
`progress` varchar(50) DEFAULT NULL COMMENT '(阅读/听书)进度',
`listened` int(1) DEFAULT NULL COMMENT '上次是否是听书标志(1:看书、2:真人、3:TTS)',
`from_user_id` varchar(32) DEFAULT NULL COMMENT '书架资产转移账户ID',
`create_time` datetime NOT NULL COMMENT '创建时间',
`update_time` datetime NOT NULL COMMENT '用户在客户端最后阅读记录时间',
`sync_time` datetime NOT NULL DEFAULT '1970-01-01 00:00:00' COMMENT '用户最后获取更新书籍记录时间',
PRIMARY KEY (`id`),
UNIQUE KEY `um_shelf_index_user_id_book_id` (`user_id`,`book_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=653275640704925698 DEFAULT CHARSET=utf8 COMMENT='用户书架图书记录表';

表结构并不复杂,除去主键索引外还有一个唯一索引,那么现在就去看一下MySQL的日志,看一下在执行这个操作前后数据在执行什么操作。这里可以通过SHOW ENGINE INNODB STATUS来查看最近捕获到的死锁日志

=====================================
2022-04-01 10:41:02 0x7f051c0c6700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 2 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1658263 srv_active, 0 srv_shutdown, 46119058 srv_idle
srv_master_thread log flush and writes: 47773811
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1874466
OS WAIT ARRAY INFO: signal count 1843076
RW-shared spins 0, rounds 3405123, OS waits 1707831
RW-excl spins 0, rounds 197500, OS waits 3749
RW-sx spins 1429, rounds 40856, OS waits 1194
Spin rounds per wait: 3405123.00 RW-shared, 197500.00 RW-excl, 28.59 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-03-31 22:10:24 0x7f051c108700
*** (1) TRANSACTION:
TRANSACTION 949308891, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 34888817, OS thread handle 139658880173824, query id 31638369 172.22.149.74 ydorder update
insert into `um_user_bookshelf_32` (`app_id`, `channel_id`, `version_id`, `user_id`, `book_id`, `chapter_id`, `progress`,`listened`, `from_user_id`, `create_time`,`update_time`,`sync_time`, id) values ('43','201002',395,'302107202231477720',622312,2,'0',1,null,NOW(),'2021-10-23 23:18:22','1970-01-01 00:00:00', 716413229296390144), ('43','201002',395,'302107202231477720',631104,3,'195',1,null,NOW(),'2022-03-31 22:10:18','1970-01-01 00:00:00', 716413229296390145), ('43','201002',395,'302107202231477720',621438,0,'0',1,null,NOW(),'2021-10-23 23:22:19','1970-01-01 00:00:00', 716413229296390146), ('43','201002',395,'302107202231477720',675781,0,'0',1,null,NOW(),'2021-10-23 23:22:39','1970-01-01 00:00:00', 716413229296390147), ('43','201002',395,'302107202231477720',678183,0,'0',1,null,NOW(),'2021-10-23 23:23:06','1970-01-01 00:00:00', 716413229296390148), ('43','201002',395,'302107202231477720',678455,0,
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5436 page no 63 n bits 464 index um_shelf_index_user_id_book_id of table `sharding_3`.`um_user_bookshelf_32` trx id 949308891 lock_mode X waiting
Record lock, heap no 150 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 18; hex 333032313037323032323331343737373230; asc 302107202231477720;;
1: len 8; hex 8000000000097ee8; asc ~ ;;
2: len 8; hex 88ab8aab239a1004; asc # ;;

*** (2) TRANSACTION:
TRANSACTION 949308890, ACTIVE 0 sec inserting, thread declared inside InnoDB 4994
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 3
MySQL thread id 34888816, OS thread handle 139659922409216, query id 31638370 172.22.149.73 ydorder update
insert into `um_user_bookshelf_32` (`app_id`, `channel_id`, `version_id`, `user_id`, `book_id`, `chapter_id`, `progress`,`listened`, `from_user_id`, `create_time`,`update_time`,`sync_time`, id) values ('43','201002',395,'302107202231477720',622312,2,'0',1,null,NOW(),'2021-10-23 23:18:22','1970-01-01 00:00:00', 716413229304774656), ('43','201002',395,'302107202231477720',631104,4,'0',1,null,NOW(),'2022-03-31 22:10:20','1970-01-01 00:00:00', 716413229304774657), ('43','201002',395,'302107202231477720',621438,0,'0',1,null,NOW(),'2021-10-23 23:22:19','1970-01-01 00:00:00', 716413229304774658), ('43','201002',395,'302107202231477720',675781,0,'0',1,null,NOW(),'2021-10-23 23:22:39','1970-01-01 00:00:00', 716413229304774659), ('43','201002',395,'302107202231477720',678183,0,'0',1,null,NOW(),'2021-10-23 23:23:06','1970-01-01 00:00:00', 716413229304774660), ('43','201002',395,'302107202231477720',678455,0,'0
*** (2) HOLDS THE LOCK(S): 事务2持有的锁
RECORD LOCKS space id 5436 page no 63 n bits 464 index um_shelf_index_user_id_book_id of table `sharding_3`.`um_user_bookshelf_32` trx id 949308890 lock_mode X
Record lock, heap no 150 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 18; hex 333032313037323032323331343737373230; asc 302107202231477720;;
1: len 8; hex 8000000000097ee8; asc ~ ;;
2: len 8; hex 88ab8aab239a1004; asc # ;;

Record lock, heap no 154 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 18; hex 333032313037323032323331343737373230; asc 302107202231477720;;
1: len 8; hex 800000000009a140; asc @;;
2: len 8; hex 88ab8aab239a100a; asc # ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5436 page no 63 n bits 464 index um_shelf_index_user_id_book_id of table `sharding_3`.`um_user_bookshelf_32` trx id 949308890 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 150 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 18; hex 333032313037323032323331343737373230; asc 302107202231477720;;
1: len 8; hex 8000000000097ee8; asc ~ ;;
2: len 8; hex 88ab8aab239a1004; asc # ;;

*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 949318674
Purge done for trx'
s n:o < 949318646 undo n:o < 0 state: running but idle
History list length 19
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421145291578096, not started
0 lock struct(s), heap size 1136, 0 row lock(s)

死锁日志的结构很清晰,主要分为两部分 ,*** (1) TRANSACTION*** (2) TRANSACTION,其中事务1正在执行insert操作,该操作正在申请索引um_shelf_index_user_id_book_id的X锁(排它锁),所以会提示lock_mode X waiting。再看事务2的部分,它持有了S锁,并在申请lock_mode X locks gap。这里由于是添加的是唯一索引,申请同一个X锁造成了循环等待,最后由于事务1的权重比较小,被选择了回滚。

从日志里面可以看出 这是个批量插入操作,两个事物操作的数据是一样,由于添加了userId+bookId的唯一索引,两个事物之间会出现互相申请X锁的情况,陷入等待循环,造成了死锁。到这里可以推断是由于并发请求导致的,然后我去翻看了一下nginx网关的访问日志,发现确实是这样,同一时间 端发起了多次重复请求,如下图所示:

解决方案:

端调用的代码需要排查,同时接口需要做幂等性处理。