由于死锁,无法使用Django更新PostgreSQL中的表

时间:2014-02-28 15:55:04

标签: sql django postgresql postgresql-9.1 django-orm

我有一个cron作业,每分钟运行一次Django管理命令。当命令启动时,它在存储在PostgreSQL数据库中的表中设置标志is_running = true(使用pgbouncer作为连接池)。未来的进程也会检查此标志,以防止在当前运行时重新运行相同的任务。

有一天我注意到系统非常慢,所以运行ps aux | grep manage.py显示了数百个这样的进程,似乎什么都不做。

然后我跑了ps aux|grep -i postgres,我看到了同样多的行,如:

postgres: dbuser dbname 127.0.0.1(47095) UPDATE waiting

查询pg_stat_activity会显示运行更新is_running = true的查询的查询:

UPDATE "myapp_job" SET "is_running" = true WHERE "myapp_job"."id" = 32

这实际上是一个非常慢的内存泄漏,它也消耗了我所有的数据库连接。作为一个止损,我一直在杀死执行永远不会完成的更新的挂起postgres进程,但这并不能解决潜在的问题。

为什么这个简单的查询没有完成?它似乎陷入僵局,但我不明白为什么。除了其他更新查询之外,没有其他任何东西可以锁定该表,但是它们都不会超过几毫秒。

另外,为什么查询不会超时?我之前遇到过死锁错误,Django / Postgres通常会抛出一个明确的死锁错误,然后我可以捕获并使用它来重试该操作。但其中一些已经等了12个多小时。

我是否可以使用pgbouncer导致事务保持打开状态,从而防止我的表上的锁被释放?

显式查询pg_locks会在我的表上显示几个独占锁,以及几个停滞的进程:

SELECT  relation::regclass,
    locktype,
    pid,
    mode,
    granted
FROM pg_locks
WHERE relation::regclass::varchar like 'myapp_job';

告诉我:

+-----------+----------+-------+------------------+---------+
| relation  | locktype |  pid  |       mode       | granted |
+-----------+----------+-------+------------------+---------+
| myapp_job | relation |  1995 | AccessShareLock  | t       |
| myapp_job | relation |  1995 | RowExclusiveLock | t       |
| myapp_job | tuple    | 31497 | ExclusiveLock    | t       |
| myapp_job | relation |  5773 | AccessShareLock  | t       |
| myapp_job | relation |  1904 | AccessShareLock  | t       |
| myapp_job | relation |  1904 | RowExclusiveLock | t       |
| myapp_job | relation |  1858 | AccessShareLock  | t       |
| myapp_job | relation |  1858 | RowExclusiveLock | t       |
| myapp_job | relation | 32348 | RowShareLock     | t       |
| myapp_job | relation | 31497 | RowShareLock     | t       |
| myapp_job | tuple    |  1995 | ExclusiveLock    | f       |
| myapp_job | tuple    | 32348 | ExclusiveLock    | f       |
| myapp_job | tuple    |  1858 | ExclusiveLock    | f       |
| myapp_job | tuple    |  1904 | ExclusiveLock    | f       |
| myapp_job | tuple    |  1950 | ExclusiveLock    | f       |
| myapp_job | relation |  1950 | AccessShareLock  | t       |
| myapp_job | relation |  1950 | RowExclusiveLock | t       |
| myapp_job | relation |  5731 | AccessShareLock  | t       |
| myapp_job | relation |  5731 | RowShareLock     | t       |
+-----------+----------+-------+------------------+---------+

将pg_lock加入pg_stat_activity会显示哪些查询导致死锁:

SELECT  pl.relation::regclass,
    pl.locktype,
    pl.pid,
    pl.mode,
    pl.granted,
    pa.query_start,
    pa.current_query as query___________________________________
FROM pg_locks as pl
inner join pg_stat_activity as pa on pa.procpid = pl.pid
WHERE pl.relation::regclass::varchar like 'myapp_job'
order by pa.query_start;

给了我:

+-----------+----------+------+------------------+---------+-------------------------------+-----------------------------------------------------------------+
| relation  | locktype | pid  |       mode       | granted |          query_start          |                              query                              |
+-----------+----------+------+------------------+---------+-------------------------------+-----------------------------------------------------------------+
| myapp_job | relation | 5731 | AccessShareLock  | t       | 2014-02-28 00:00:01.936118-05 | <IDLE> in transaction                                           |
| myapp_job | relation | 5731 | RowShareLock     | t       | 2014-02-28 00:00:01.936118-05 | <IDLE> in transaction                                           |
| myapp_job | relation | 5773 | AccessShareLock  | t       | 2014-02-28 07:33:37.967912-05 | <IDLE> in transaction                                           |
| myapp_job | tuple    | 3867 | ExclusiveLock    | t       | 2014-02-28 10:46:47.363178-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | relation | 3867 | RowExclusiveLock | t       | 2014-02-28 10:46:47.363178-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | relation | 3867 | AccessShareLock  | t       | 2014-02-28 10:46:47.363178-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | tuple    | 3893 | ExclusiveLock    | f       | 2014-02-28 10:47:01.860486-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | relation | 3893 | AccessShareLock  | t       | 2014-02-28 10:47:01.860486-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | relation | 3893 | RowExclusiveLock | t       | 2014-02-28 10:47:01.860486-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | relation | 3932 | RowExclusiveLock | t       | 2014-02-28 10:48:02.124961-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | relation | 3932 | AccessShareLock  | t       | 2014-02-28 10:48:02.124961-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
| myapp_job | tuple    | 3932 | ExclusiveLock    | f       | 2014-02-28 10:48:02.124961-05 | UPDATE myapp_job SET is_running = true WHERE myapp_job.id = 32  |
+-----------+----------+------+------------------+---------+-------------------------------+-----------------------------------------------------------------+

正如您所看到的,最早的ExclusiveLock已被授予,但已经运行了十多分钟,只是为了在一行上设置is_running = true。 为什么会这样?

0 个答案:

没有答案