Oracle SQL - 匹配时,更新并插入

时间:2017-01-16 10:21:15

标签: sql oracle merge

我在这里发现了一些类似的问题,但似乎没有一个问题符合我的情况。

我正在制作的表格是这样的,它是一张记录学生课程表现的表格:

| STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE | MARK | STATUS  | VERSION |
|            |           |                 |      |         |         |
| 1234       | 5678      | Mandatory       | 70   | ACTIVE  | 2       |
| 1234       | 5678      | Optional        | 70   | HISTORY | 1       |
| 1234       | 5678      | Optional        | null | HISTORY | 0       |
| 9876       | 4597      | Institutional   | 99   | ACTIVE  | 1       |
| 9876       | 4597      | Institutional   | null | HISTORY | 0       |

我需要将其与另一个表合并,该表根据他们的组跟踪学生的注册情况,以便我可以根据需要插入或更新行:

| GROUP_ID | STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE |
| 4976555  | 1234       | 5678      | Mandatory       |
| 6399875  | 1234       | 9034      | Optional        |
| 6399875  | 9876       | 4597      | Institutional   |

长话短说,我需要检查注册类型是否相同或已经更改,因为有些学生根据他们所属的小组注册课程,但这些小组可以在一夜之间改变。

直到这里都很好,但如果行匹配,我还需要复制我要更新的行并将其设置为" HISTORY",以便我们可以保留记录已对某一行进行的所有更新。

目前,我有一个典型的

MERGE INTO performance USING group_enrollments
ON performance.STUDENT_ID = group_enrollments.STUDENT_ID 
AND performance.COURSE_ID = group_enrollments.COURSE_ID
WHEN MATCHED THEN UPDATE ......
WHEN NOT MATCHED THEN INSERT ......

在我之前处理这部分代码的人认为将所有行复制为" HISTORY"是一个好主意。在进行合并之前,这会给我们带来问题,因为此过程每晚运行并且每次写入超过150.000行。 有什么想法吗?

16/01 12.34:更新了有关表格及其关系的更多信息

1 个答案:

答案 0 :(得分:1)

好的,首先,我会编写一个查询,生成要更新和/或插入的行:

WITH    performance AS (SELECT 1234 student_id, 5678 course_id, 'Mandatory' enrollment_type, 70 mark, 'ACTIVE' status, 2 VERSION FROM dual UNION ALL
                        SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, 70 mark, 'HISTORY' status, 1 VERSION FROM dual UNION ALL
                        SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual UNION ALL
                        SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, 99 mark, 'ACTIVE' status, 1 VERSION FROM dual UNION ALL
                        SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual),
  group_enrollments AS (SELECT 4976555 group_id, 1234 student_id, 5678 course_id, 'Mandatory2' enrollment_type FROM dual UNION ALL
                        SELECT 6399875 group_id, 1234 student_id, 9034 course_id, 'Optional' enrollment_type FROM dual UNION ALL
                        SELECT 6399875 group_id, 9876 student_id, 4597 course_id, 'Institutional' enrollment_type FROM dual)
-- end of mimicking your tables with data in them
SELECT res.student_id,
       res.course_id,
       CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
            WHEN dummy.id = 2 THEN res.old_enrollment_type
       END enrollment_type,
       res.mark,
       CASE WHEN dummy.id = 1 THEN 'ACTIVE'
            WHEN dummy.id = 2 THEN 'HISTORY'
       END status,
       CASE WHEN dummy.id = 1 THEN res.new_version
            WHEN dummy.id = 2 THEN res.old_version
       END VERSION 
FROM   (SELECT ge.student_id,
               ge.course_id,
               ge.enrollment_type new_enrollment_type,
               p.enrollment_type old_enrollment_type,
               p.mark,
               p.status,
               p.version old_version,
               nvl(p.version + 1, 0) new_VERSION
                 -- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
        FROM   group_enrollments ge
               LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
                                                AND ge.course_id = p.course_id
        WHERE  (p.status = 'ACTIVE' OR p.status IS NULL)
        AND    (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
        INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                    SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                    OR (dummy.id = 2 
                                                        AND res.status = 'ACTIVE');

STUDENT_ID  COURSE_ID ENROLLMENT_TYPE       MARK STATUS     VERSION
---------- ---------- --------------- ---------- ------- ----------
      1234       5678 Mandatory2              70 ACTIVE           3
      1234       9034 Optional                   ACTIVE           0
      1234       5678 Mandatory               70 HISTORY          2

此查询首先查找全新的行(即group_enrollment表中不在性能表中有行的行)或具有不同的enrollment_type。这些是需要插入或更新的行。

一旦我们知道了,我们就可以加入一个虚拟的2行表格表,这样我们就可以一直加入第一个虚拟行,无论我们是否需要插入或更新,但我们都会这样做如果我们需要更新,则只加入第二个虚拟行。这意味着我们只有一行用于插入,但有两行用于更新。

然后根据dummy.id(第一个虚拟行的新值,第二个虚拟行的旧值)输出正确的值很容易。

一旦我们完成了这项工作,我们就知道需要将哪些数据合并到性能表中,所以现在merge语句看起来像:

merge into performance tgt
  using (SELECT res.student_id,
                res.course_id,
                CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
                     WHEN dummy.id = 2 THEN res.old_enrollment_type
                END enrollment_type,
                res.mark,
                CASE WHEN dummy.id = 1 THEN 'ACTIVE'
                     WHEN dummy.id = 2 THEN 'HISTORY'
                END status,
                CASE WHEN dummy.id = 1 THEN res.new_version
                     WHEN dummy.id = 2 THEN res.old_version
                END VERSION 
         FROM   (SELECT ge.student_id,
                        ge.course_id,
                        ge.enrollment_type new_enrollment_type,
                        p.enrollment_type old_enrollment_type,
                        p.mark,
                        p.status,
                        p.version old_version,
                        nvl(p.version + 1, 0) new_VERSION
                          -- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
                 FROM   group_enrollments ge
                        LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
                                                         AND ge.course_id = p.course_id
                 WHERE  (p.status = 'ACTIVE' OR p.status IS NULL)
                 AND    (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
                 INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                             SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                             OR (dummy.id = 2 
                                                                 AND res.status = 'ACTIVE')) src
    ON (tgt.student_id = src.student_id AND tgt.course_id = src.course_id AND tgt.status = src.status)
WHEN MATCHED THEN
  UPDATE SET tgt.enrollment_type = src.enrollment_type,
             tgt.version = src.version
WHEN NOT MATCHED THEN
  INSERT (tgt.student_id, tgt.course_id, tgt.enrollment_type, tgt.mark, tgt.status, tgt.version)
  VALUES (src.student_id, src.course_id, src.enrollment_type, src.mark, src.status, src.version);

为了澄清,这里是一个非常简单的行条件重复示例(我们也可以将其称为部分交叉连接,因为一个表中的所有行都连接到另一个表中的至少一行):

WITH sample_data AS (SELECT 100 ID, NULL status FROM dual UNION ALL -- expect only one row
                     SELECT 101 ID, 'A' status FROM dual UNION ALL -- expect two rows
                     SELECT 102 ID, 'B' status FROM dual -- expect only one row
                    )
SELECT dummy.id dummy_row_id,
       sd.id,
       sd.status
FROM   sample_data sd
       INNER JOIN (SELECT 1 ID FROM dual UNION ALL
                   SELECT 2 ID FROM dual) dummy ON dummy.id = 1
                                                   OR (dummy.id = 2 
                                                       AND sd.status = 'A')
ORDER BY sd.id, dummy.id;

DUMMY_ROW_ID         ID STATUS
------------ ---------- ------
           1        100 
           1        101 A
           2        101 A
           1        102 B

你可以看到,对于sample_data" table"中的id = 101行,我们有两行,但其他两个id每行只有一行。

希望能为你澄清一些事情吗?