今天,我们的生产数据库(Aurora PostgreSQL 9.6.3)遇到了死锁情况,其中多个进程试图在单行上执行相同的UPDATE
查询。我们认为死锁只能在无序更新多行时发生,所以这是一个惊喜;然而,它确实发生在我们当天最忙碌的时刻。
这是我们的Python代码中包含UPDATE
语句的事务(它是一个穷人的UPSERT
):
with self.connection.cursor() as cursor:
cursor.execute("""UPDATE students SET name = %s WHERE uuid = %s AND activity_id = %s""", (name, uuid, activityId))
if cursor.rowcount <= 0:
cursor.execute("""INSERT INTO students (name, uuid, activity_id) VALUES (%s, %s, %s)""", (name, uuid, activityId))
if cursor.rowcount <= 0:
self.connection.rollback()
raise BaseDao.NotUpserted("No student name was updated or inserted for activity_id %d and uuid %s" % (activityId, uuid))
else:
self.connection.commit()
以下是日志中的一些相关行,包括仅更新一行的简单查询:
...
2018-01-19 16:21:27 UTC:[38161]:ERROR: deadlock detected
2018-01-19 16:21:27 UTC:[38161]:DETAIL: Process 38161 waits for ShareLock on transaction 90490253; blocked by process 25147.
Process 25147 waits for ShareLock on transaction 90490267; blocked by process 38161.
Process 38161: UPDATE students SET name = 'foobar' WHERE uuid = 'ca1b2d153cbdc9574cce' AND activity_id = 35473237
Process 25147: UPDATE students SET name = 'foobar' WHERE uuid = 'ca1b2d153cbdc9574cce' AND activity_id = 35473237
...
以下是两个相关的表格:
db=> \d students
Table "public.students"
Column | Type | Modifiers
-------------+------------------------+-------------------------------------------------------------------
id | integer | not null default nextval('students_id_seq'::regclass)
name | character varying(128) | not null
uuid | character varying(40) | not null
activity_id | integer | not null
Indexes:
"students_pkey" PRIMARY KEY, btree (id)
"students_activity_id" btree (activity_id)
Foreign-key constraints:
"activity_id_refs_id_76c08098" FOREIGN KEY (activity_id) REFERENCES activities(id) DEFERRABLE INITIALLY DEFERRED
db=> \d activities
Table "public.activities"
Column | Type | Modifiers
-------------------+--------------------------+----------------------------------------------------------------------
id | integer | not null default nextval('activities_id_seq'::regclass)
start_time | timestamp with time zone | not null
end_time | timestamp with time zone |
activity_type | character varying(2) | not null
activity_id | integer | not null
started_by_id | integer | not null
activity_state | integer | not null
legacy_id | integer |
hide_report | boolean | not null
report_status | integer |
students_finished | text | not null
room_name | text |
last_updated | timestamp with time zone |
state | integer |
Indexes:
"activities_pkey" PRIMARY KEY, btree (id)
"activities_end_time" btree (end_time)
"activities_room_name_c1f9997a_like" btree (room_name text_pattern_ops)
"activities_room_name_c1f9997a_uniq" btree (room_name)
"activities_started_by_id" btree (started_by_id)
Foreign-key constraints:
"started_by_id_refs_id_5ea35c7a" FOREIGN KEY (started_by_id) REFERENCES users(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "students" CONSTRAINT "activity_id_refs_id_76c08098" FOREIGN KEY (activity_id) REFERENCES activities(id) DEFERRABLE INITIALLY DEFERRED
当只更新一行时,我们如何陷入这样的死锁?
答案 0 :(得分:0)
我可以想到造成这种僵局的两个原因:
执行更新的事务包含多个语句,而其他语句也会创建锁定。
涉及创建额外锁定的触发器。
请记住,死锁不是错误,除非它们经常发生;无法处理死锁是一个错误。只需重试失败的交易。