Postgres并发upserts死锁

时间:2017-09-22 13:45:38

标签: postgresql transactions deadlock database-deadlocks

我们有一个应用程序,它从数据流中读取并将该信息存储到数据库中。数据是Google云端硬盘上发生的变化,这意味着影响相同对象的许多事件可能会彼此非常接近。

当将此信息上传到数据库时,我们遇到了死锁,这是日志中出现的内容。为了便于阅读,我重新构建并清理了查询:

function connectToSocket()
    print ("Connect to socket called, OK.")
    local ws_client = websocket.createClient()
end

wifi.setphymode(wifi.PHYMODE_N)
wifi.setmode(wifi.STATION)
wifi.sta.config("SSID","PWD")
wifi.sta.eventMonReg(wifi.STA_IDLE, function() print("IDLE") end)
wifi.sta.eventMonReg(wifi.STA_CONNECTING, function() print("CONNECTING...") end)
wifi.sta.eventMonReg(wifi.STA_WRONGPWD, function() print("WRONG PASSWORD!!!") end)
wifi.sta.eventMonReg(wifi.STA_APNOTFOUND, function() print("NO SUCH SSID FOUND") end)
wifi.sta.eventMonReg(wifi.STA_FAIL, function() print("FAILED TO CONNECT") end)
wifi.sta.eventMonReg(wifi.STA_GOTIP, function() 
    print("GOT IP "..wifi.sta.getip()) 
    connectToSocket()
end)
wifi.sta.eventMonStart()
wifi.sta.connect()

架构:

ERROR:  deadlock detected
DETAIL:  Process 10586 waits for ShareLock on transaction 166892743; blocked by process 10597.
  Process 10597 waits for ShareLock on transaction 166892741; blocked by process 10586.
  Process 10586: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

  Process 10597: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.167Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.167Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

HINT:  See server log for query details.
CONTEXT:  while locking tuple (3908269,11) in relation "documents"
STATEMENT:  
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

我怀疑问题类似于“第一个事务是按顺序锁定带有source_id alpha,beta,gamma的行,同时第二个事务是以相反的顺序锁定具有source_id delta,gamma的行,并且发生了死锁他们都锁定了伽马和三角洲“,但这里的时间非常紧张!

这个解决方案是什么?按source_id排序我们的值列表?

2 个答案:

答案 0 :(得分:3)

我可以想到三个解决方案:

  1. 每个语句只插入一行,但效率不高。

  2. 在插入行之前对行进行排序。

  3. 如果事件出现死锁或序列化错误,则重试该事务。

  4. 除非错误经常发生,否则我更喜欢第三种解决方案。

答案 1 :(得分:1)

您的查询语法允许轻松排序值:

INSERT INTO documents
          (version, source, source_id, ingestion_date)
   SELECT * FROM (
      VALUES
          (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
          (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')
      ) AS v ORDER BY source, source_id

      ON CONFLICT (source, source_id)

这应该可以解决您的问题。性能应该很好,因为排序很小。