postgresql触发器函数死锁

时间:2013-12-10 21:48:16

标签: postgresql triggers deadlock

使用postgres 9.3,我有一个名为regression_runs的表,它存储了一些计数器。当更新,插入或删除此表中的行时,将调用触发器函数来更新nightly_runs表中的行,以保持具有给定ID的所有regression_runs的那些计数器的运行总计。我采取的方法相当广泛。但问题是,当多个进程试图在regression_runs表中使用相同的nightly_run_id同时插入新行时,我遇到了死锁。

regression_runs表如下所示:

regression=> \d regression_runs
                                      Table "public.regression_runs"
     Column      |           Type           |                          Modifiers                           
-----------------+--------------------------+--------------------------------------------------------------
 id              | integer                  | not null default nextval('regression_runs_id_seq'::regclass)
 username        | character varying(16)    | not null
 nightly_run_id  | integer                  | 
 nightly_run_pid | integer                  | 
 passes          | integer                  | not null default 0
 failures        | integer                  | not null default 0
 errors          | integer                  | not null default 0
 skips           | integer                  | not null default 0
Indexes:
    "regression_runs_pkey" PRIMARY KEY, btree (id)
    "regression_runs_nightly_run_id_idx" btree (nightly_run_id)
Foreign-key constraints:
    "regression_runs_nightly_run_id_fkey" FOREIGN KEY (nightly_run_id) REFERENCES nightly_runs(id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
    regression_run_update_trigger AFTER INSERT OR DELETE OR UPDATE ON regression_runs FOR EACH ROW EXECUTE PROCEDURE regression_run_update()

nightly_runs表如下所示:

regression=> \d nightly_runs
                                    Table "public.nightly_runs"
   Column   |           Type           |                         Modifiers                         
------------+--------------------------+-----------------------------------------------------------
 id         | integer                  | not null default nextval('nightly_runs_id_seq'::regclass)
 passes     | integer                  | not null default 0
 failures   | integer                  | not null default 0
 errors     | integer                  | not null default 0
 skips      | integer                  | not null default 0
Indexes:
    "nightly_runs_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "regression_runs" CONSTRAINT "regression_runs_nightly_run_id_fkey" FOREIGN KEY (nightly_run_id) REFERENCES nightly_runs(id) ON UPDATE CASCADE ON DELETE CASCADE

触发函数regression_run_update是:

CREATE OR REPLACE FUNCTION regression_run_update() RETURNS "trigger"
    AS $$
        BEGIN
        IF TG_OP = 'UPDATE' THEN
                IF (NEW.nightly_run_id IS NOT NULL) and (NEW.nightly_run_id = OLD.nightly_run_id) THEN
                        UPDATE nightly_runs SET passes = passes + (NEW.passes - OLD.passes), failures = failures + (NEW.failures - OLD.failures), errors = errors + (NEW.errors - OLD.errors), skips = skips + (NEW.skips - OLD.skips) WHERE id = NEW.nightly_run_id;
                ELSE
                        IF NEW.nightly_run_id IS NOT NULL THEN
                                UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id;
                        END IF;
                        IF OLD.nightly_run_id IS NOT NULL THEN
                                UPDATE nightly_runs SET passes = passes - OLD.passes, failures = failures - OLD.failures, errors = errors - OLD.errors, skips = skips - OLD.skips WHERE id = OLD.nightly_run_id;
                        END IF;
                END IF;
        ELSIF TG_OP = 'INSERT' THEN
                IF NEW.nightly_run_id IS NOT NULL THEN
                        UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id;
                END IF;
        ELSIF TG_OP = 'DELETE' THEN
                IF OLD.nightly_run_id IS NOT NULL THEN
                        UPDATE nightly_runs SET passes = passes - OLD.passes, failures = failures - OLD.failures, errors = errors - OLD.errors, skips = skips - OLD.skips WHERE id = OLD.nightly_run_id;
                END IF;
        END IF;
        RETURN NEW;
        END;
$$  
    LANGUAGE plpgsql;

我在postgres日志文件中看到的是:

ERROR:  deadlock detected
DETAIL:  Process 20266 waits for ShareLock on transaction 7520; blocked by process 20263.
        Process 20263 waits for ExclusiveLock on tuple (1,70) of relation 18469 of database 18354; blocked by process 20266.
        Process 20266: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20262);
        Process 20263: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20260);
HINT:  See server log for query details.
CONTEXT:  SQL statement "UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id"
        PL/pgSQL function regression_run_update() line 16 at SQL statement
STATEMENT:  insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20262);

我可以用这个脚本重现问题:

#!/usr/bin/env python

import os
import multiprocessing
import psycopg2

class Foo(object):
    def child(self):
        pid = os.getpid()
        conn = psycopg2.connect(
            'dbname=regression host=localhost user=regression')
        cur = conn.cursor()
        for i in xrange(100):
            cur.execute(
                "insert into regression_runs "
                "(username, nightly_run_id, nightly_run_pid) "
                "values "
                "('tbeadle', %s, %s);", (self.nid, pid))
            conn.commit()
        return

    def start(self):
        conn = psycopg2.connect(
            'dbname=regression host=localhost user=regression')
        cur = conn.cursor()
        cur.execute('insert into nightly_runs default values returning id;')
        row = cur.fetchone()
        conn.commit()
        self.nid = row[0]
        procs = []
        for child in xrange(5):
            procs.append(multiprocessing.Process(target=self.child))
        for proc in procs:
            proc.start()
        for proc in procs:
            proc.join()

Foo().start()

我无法弄清楚为什么会发生僵局或我能做些什么。请帮忙!

1 个答案:

答案 0 :(得分:3)

通常会发生死锁,因为与OLD和NEW相关的更新不会以一致的顺序强制执行。一个很好的例子:

IF TG_OP = 'UPDATE' THEN
  IF (NEW.nightly_run_id IS NOT NULL) AND (NEW.nightly_run_id = OLD.nightly_run_id) THEN
    -- stuff that seems fine
  ELSE
    IF NEW.nightly_run_id IS NOT NULL THEN
      UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id; -- lock
    END IF;
    IF OLD.nightly_run_id IS NOT NULL THEN
      UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id; -- lock
    END IF;

想象一下两个交易:

  • T1获取对new.nightly_run_id = 1的锁定并等待对old.nightly_run_id = 2的锁定。
  • T2获取对new.nightly_run_id = 2的锁定并等待对old.nightly_run_id = 1的锁定。

...死锁

强制命令避免某种情况:

IF OLD.nightly_run_id = NEW.nightly_run_id THEN
  -- stuff that seems fine
ELSIF OLD.nightly_run_id < NEW.nightly_run_id THEN
  UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id;
  UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id;
ELSEIF NEW.nightly_run_id < OLD.nightly_run_id THEN
  UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id;
  UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id;
ELSEIF OLD.nightly_run_id IS NOT NULL THEN
  UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id;
ELSEIF NEW.nightly_run_id IS NOT NULL THEN
  UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id;
END IF;

适用时,您的其他触发器应发生同样的更改。在代码中禁止其他病态,死锁应该消失。