使用postgres 9.3,我有一个名为regression_runs的表,它存储了一些计数器。当更新,插入或删除此表中的行时,将调用触发器函数来更新nightly_runs表中的行,以保持具有给定ID的所有regression_runs的那些计数器的运行总计。我采取的方法相当广泛。但问题是,当多个进程试图在regression_runs表中使用相同的nightly_run_id同时插入新行时,我遇到了死锁。
regression_runs表如下所示:
regression=> \d regression_runs
Table "public.regression_runs"
Column | Type | Modifiers
-----------------+--------------------------+--------------------------------------------------------------
id | integer | not null default nextval('regression_runs_id_seq'::regclass)
username | character varying(16) | not null
nightly_run_id | integer |
nightly_run_pid | integer |
passes | integer | not null default 0
failures | integer | not null default 0
errors | integer | not null default 0
skips | integer | not null default 0
Indexes:
"regression_runs_pkey" PRIMARY KEY, btree (id)
"regression_runs_nightly_run_id_idx" btree (nightly_run_id)
Foreign-key constraints:
"regression_runs_nightly_run_id_fkey" FOREIGN KEY (nightly_run_id) REFERENCES nightly_runs(id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers:
regression_run_update_trigger AFTER INSERT OR DELETE OR UPDATE ON regression_runs FOR EACH ROW EXECUTE PROCEDURE regression_run_update()
nightly_runs表如下所示:
regression=> \d nightly_runs
Table "public.nightly_runs"
Column | Type | Modifiers
------------+--------------------------+-----------------------------------------------------------
id | integer | not null default nextval('nightly_runs_id_seq'::regclass)
passes | integer | not null default 0
failures | integer | not null default 0
errors | integer | not null default 0
skips | integer | not null default 0
Indexes:
"nightly_runs_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "regression_runs" CONSTRAINT "regression_runs_nightly_run_id_fkey" FOREIGN KEY (nightly_run_id) REFERENCES nightly_runs(id) ON UPDATE CASCADE ON DELETE CASCADE
触发函数regression_run_update是:
CREATE OR REPLACE FUNCTION regression_run_update() RETURNS "trigger"
AS $$
BEGIN
IF TG_OP = 'UPDATE' THEN
IF (NEW.nightly_run_id IS NOT NULL) and (NEW.nightly_run_id = OLD.nightly_run_id) THEN
UPDATE nightly_runs SET passes = passes + (NEW.passes - OLD.passes), failures = failures + (NEW.failures - OLD.failures), errors = errors + (NEW.errors - OLD.errors), skips = skips + (NEW.skips - OLD.skips) WHERE id = NEW.nightly_run_id;
ELSE
IF NEW.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id;
END IF;
IF OLD.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs SET passes = passes - OLD.passes, failures = failures - OLD.failures, errors = errors - OLD.errors, skips = skips - OLD.skips WHERE id = OLD.nightly_run_id;
END IF;
END IF;
ELSIF TG_OP = 'INSERT' THEN
IF NEW.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id;
END IF;
ELSIF TG_OP = 'DELETE' THEN
IF OLD.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs SET passes = passes - OLD.passes, failures = failures - OLD.failures, errors = errors - OLD.errors, skips = skips - OLD.skips WHERE id = OLD.nightly_run_id;
END IF;
END IF;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
我在postgres日志文件中看到的是:
ERROR: deadlock detected
DETAIL: Process 20266 waits for ShareLock on transaction 7520; blocked by process 20263.
Process 20263 waits for ExclusiveLock on tuple (1,70) of relation 18469 of database 18354; blocked by process 20266.
Process 20266: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20262);
Process 20263: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20260);
HINT: See server log for query details.
CONTEXT: SQL statement "UPDATE nightly_runs SET passes = passes + NEW.passes, failures = failures + NEW.failures, errors = errors + NEW.errors, skips = skips + NEW.skips WHERE id = NEW.nightly_run_id"
PL/pgSQL function regression_run_update() line 16 at SQL statement
STATEMENT: insert into regression_runs (username, nightly_run_id, nightly_run_pid) values ('tbeadle', 135, 20262);
我可以用这个脚本重现问题:
#!/usr/bin/env python
import os
import multiprocessing
import psycopg2
class Foo(object):
def child(self):
pid = os.getpid()
conn = psycopg2.connect(
'dbname=regression host=localhost user=regression')
cur = conn.cursor()
for i in xrange(100):
cur.execute(
"insert into regression_runs "
"(username, nightly_run_id, nightly_run_pid) "
"values "
"('tbeadle', %s, %s);", (self.nid, pid))
conn.commit()
return
def start(self):
conn = psycopg2.connect(
'dbname=regression host=localhost user=regression')
cur = conn.cursor()
cur.execute('insert into nightly_runs default values returning id;')
row = cur.fetchone()
conn.commit()
self.nid = row[0]
procs = []
for child in xrange(5):
procs.append(multiprocessing.Process(target=self.child))
for proc in procs:
proc.start()
for proc in procs:
proc.join()
Foo().start()
我无法弄清楚为什么会发生僵局或我能做些什么。请帮忙!
答案 0 :(得分:3)
通常会发生死锁,因为与OLD和NEW相关的更新不会以一致的顺序强制执行。一个很好的例子:
IF TG_OP = 'UPDATE' THEN
IF (NEW.nightly_run_id IS NOT NULL) AND (NEW.nightly_run_id = OLD.nightly_run_id) THEN
-- stuff that seems fine
ELSE
IF NEW.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id; -- lock
END IF;
IF OLD.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id; -- lock
END IF;
想象一下两个交易:
...死锁
强制命令避免某种情况:
IF OLD.nightly_run_id = NEW.nightly_run_id THEN
-- stuff that seems fine
ELSIF OLD.nightly_run_id < NEW.nightly_run_id THEN
UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id;
UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id;
ELSEIF NEW.nightly_run_id < OLD.nightly_run_id THEN
UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id;
UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id;
ELSEIF OLD.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs ... WHERE id = OLD.nightly_run_id;
ELSEIF NEW.nightly_run_id IS NOT NULL THEN
UPDATE nightly_runs ... WHERE id = NEW.nightly_run_id;
END IF;
适用时,您的其他触发器应发生同样的更改。在代码中禁止其他病态,死锁应该消失。