Oracle中的死锁

时间:2015-02-11 13:10:31

标签: oracle database-deadlocks

我想创建一个脚本,其中自动杀死进入死锁的oracle会话。是否有可能找到进入死锁的会话的会话ID。到目前为止,我必须将数据库退回到删除死锁。是否可以解决此问题?

2 个答案:

答案 0 :(得分:7)

  

我想创建一个脚本,自动杀死进入死锁的oracle会话

编辑以更好的方式解释,更正了几句话,并添加了一个测试用例来演示死锁场景。

为什么要重新发明轮子? Oracle自动检测到死锁,抛出ORA-00060: deadlock detected while waiting for resource,并回滚Oracle决定作为受害者的死锁中涉及的一个事务。以前成功的事务不会回滚。即使在死锁错误之后,如果发出提交,也会提交先前成功的事务。此时,其他会话的事务也将成功,您可以发出提交。您无需在此处明确执行任何操作。死锁会自动清除 - 您永远不需要清除它们。

通常,Oracle需要一两秒钟来检测死锁并抛出错误。

您可以尝试使用此处演示的简单测试用例:Understanding Oracle Deadlock

让我们看一下测试用例 -

SQL> CREATE TABLE t_test(col_1 NUMBER, col_2 NUMBER);

Table created
SQL> INSERT INTO t_test VALUES(1,2);

1 row inserted
SQL> INSERT INTO t_test VALUES(3,4);

1 row inserted

SQL> COMMIT;

Commit complete

SQL> SELECT * FROM t_test;

     COL_1      COL_2
---------- ----------
         1          2
         3          4

请注意每笔交易的时间,我已经设定了时间安排,以便更好地理解。

会话:1

12:16:06 SQL> UPDATE t_test SET col_1 = 5 WHERE col_2=2;

1 row updated.

Elapsed: 00:00:00.00

会话:2

12:16:04 SQL> UPDATE t_test SET col_1 = 6 WHERE col_2=4;

1 row updated.

Elapsed: 00:00:00.00
12:16:31 SQL> UPDATE t_test SET col_1 = 7 WHERE col_2=2;

此时,会话2继续等待

会话:1

12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;

此时, SESSION 2 是死锁的受害者, SESSION 1 仍在等待。

让我们看一下 SESSION 2 -

中的会话详情
12:22:15 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe';

       SID STATUS   PROGRAM         SQL_ID        STATE               WAIT_CLASS      BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- --------------- ----------- ----------------------------------------------------------------
        14 ACTIVE   sqlplus.exe     60qmqpmbmyhxn WAITED SHORT TIME   Network         NOT IN WAIT SQL*Net message to client
       134 ACTIVE   sqlplus.exe     5x0zg4qwus29v WAITING             Application     VALID       enq: TX - row lock contention

Elapsed: 00:00:00.00
12:22:18 SQL>

因此,在 SESSION 2 中查看时,v$session详细信息,即SID 14,表示状态为 ACTIVE

让我们看一下来自另一个会话的会话详细信息,让我们称之为 SESSION 3 。请记住, SESSION 1 仍在等待。

SQL> set time on timing on
12:24:41 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe'

       SID STATUS   PROGRAM         SQL_ID        STATE               WAIT_CLASS BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- ---------- ----------- ------------------------------
        13 ACTIVE   sqlplus.exe     60qmqpmbmyhxn WAITED SHORT TIME   Network    NOT IN WAIT SQL*Net message to client
        14 INACTIVE sqlplus.exe                   WAITING             Idle       NO HOLDER   SQL*Net message from client
       134 ACTIVE   sqlplus.exe     5x0zg4qwus29v WAITING             Applicatio VALID       enq: TX - row lock contention
                                                                      n


Elapsed: 00:00:00.01
12:24:44 SQL>

因此,对于其他会话, SESSION 2 ,即SID 14, INACTIVE 。事件enq: TX - row lock contention 会话1 仍然等待

让我们提交 SESSION 2 -

12:22:18 SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
12:25:43 SQL>

此时,锁定被释放为 SESSION 1 ,让我们提交会话1 -

12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;

1 row updated.

Elapsed: 00:08:27.29
12:25:43 SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
12:26:26 SQL>

Elapsed: 00:08:27.29显示 SESSION 1 等待很长时间,直到 SESSION 2 提交。

总结一下,这里是第1节的整个故事 -

12:16:06 SQL> UPDATE t_test SET col_1 = 5 WHERE col_2=2;

1 row updated.

Elapsed: 00:00:00.00
12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;

1 row updated.

Elapsed: 00:08:27.29
12:25:43 SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
12:26:26 SQL>

总结一下,这里是第2场会议的整个故事 -

12:16:04 SQL> UPDATE t_test SET col_1 = 6 WHERE col_2=4;

1 row updated.

Elapsed: 00:00:00.00
12:16:31 SQL> UPDATE t_test SET col_1 = 7 WHERE col_2=2;
UPDATE t_test SET col_1 = 7 WHERE col_2=2
                                  *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource


Elapsed: 00:00:24.47
12:22:15 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe';

       SID STATUS   PROGRAM         SQL_ID        STATE               WAIT_CLASS      BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- --------------- ----------- ----------------------------------------------------------------
        14 ACTIVE   sqlplus.exe     60qmqpmbmyhxn WAITED SHORT TIME   Network         NOT IN WAIT SQL*Net message to client
       134 ACTIVE   sqlplus.exe     5x0zg4qwus29v WAITING             Application     VALID       enq: TX - row lock contention

Elapsed: 00:00:00.00
12:22:18 SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
12:25:43 SQL>

现在,让我们看看实际回滚了哪个事务,哪些已提交 -

12:25:43 SQL> select * from t_test;

     COL_1      COL_2
---------- ----------
         5          2
         8          4

Elapsed: 00:00:00.00
12:30:36 SQL>

<强>结论

在我看来,了解死锁会话详细信息的最佳方法是将详细信息记录为尽可能详细。否则,DBA在没有记录适当信息的情况下进行调查是一场噩梦。就此而言,如果没有详细记录死锁错误详细信息,即使是开发人员也会发现纠正和修复实际设计缺陷是一项艰巨的任务。最后用一句话来说,死锁是由于设计缺陷造成的,甲骨文只是受害者而应用程序是罪魁祸首。死锁是可怕的,但他们指出设计缺陷迟早必须纠正。

答案 1 :(得分:0)

用户1

update table_c set id = 200 where id = 13;
BEGIN
DBMS_LOCK.sleep(14);
END;
/
update table_c set id = 200 where id = 15;

用户2

update table_c set id = 2000 where id = 15;

BEGIN
DBMS_LOCK.sleep(14);
END;
/

update table_c set id = 1000 where id = 13;