我在这里发现了一些类似的问题,但似乎没有一个问题符合我的情况。
我正在制作的表格是这样的,它是一张记录学生课程表现的表格:
| STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE | MARK | STATUS | VERSION |
| | | | | | |
| 1234 | 5678 | Mandatory | 70 | ACTIVE | 2 |
| 1234 | 5678 | Optional | 70 | HISTORY | 1 |
| 1234 | 5678 | Optional | null | HISTORY | 0 |
| 9876 | 4597 | Institutional | 99 | ACTIVE | 1 |
| 9876 | 4597 | Institutional | null | HISTORY | 0 |
我需要将其与另一个表合并,该表根据他们的组跟踪学生的注册情况,以便我可以根据需要插入或更新行:
| GROUP_ID | STUDENT_ID | COURSE_ID | ENROLLMENT_TYPE |
| 4976555 | 1234 | 5678 | Mandatory |
| 6399875 | 1234 | 9034 | Optional |
| 6399875 | 9876 | 4597 | Institutional |
长话短说,我需要检查注册类型是否相同或已经更改,因为有些学生根据他们所属的小组注册课程,但这些小组可以在一夜之间改变。
直到这里都很好,但如果行匹配,我还需要复制我要更新的行并将其设置为" HISTORY",以便我们可以保留记录已对某一行进行的所有更新。
目前,我有一个典型的
MERGE INTO performance USING group_enrollments
ON performance.STUDENT_ID = group_enrollments.STUDENT_ID
AND performance.COURSE_ID = group_enrollments.COURSE_ID
WHEN MATCHED THEN UPDATE ......
WHEN NOT MATCHED THEN INSERT ......
在我之前处理这部分代码的人认为将所有行复制为" HISTORY"是一个好主意。在进行合并之前,这会给我们带来问题,因为此过程每晚运行并且每次写入超过150.000行。 有什么想法吗?
16/01 12.34:更新了有关表格及其关系的更多信息
答案 0 :(得分:1)
好的,首先,我会编写一个查询,生成要更新和/或插入的行:
WITH performance AS (SELECT 1234 student_id, 5678 course_id, 'Mandatory' enrollment_type, 70 mark, 'ACTIVE' status, 2 VERSION FROM dual UNION ALL
SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, 70 mark, 'HISTORY' status, 1 VERSION FROM dual UNION ALL
SELECT 1234 student_id, 5678 course_id, 'Optional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual UNION ALL
SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, 99 mark, 'ACTIVE' status, 1 VERSION FROM dual UNION ALL
SELECT 9876 student_id, 4597 course_id, 'Institutional' enrollment_type, NULL mark, 'HISTORY' status, 0 VERSION FROM dual),
group_enrollments AS (SELECT 4976555 group_id, 1234 student_id, 5678 course_id, 'Mandatory2' enrollment_type FROM dual UNION ALL
SELECT 6399875 group_id, 1234 student_id, 9034 course_id, 'Optional' enrollment_type FROM dual UNION ALL
SELECT 6399875 group_id, 9876 student_id, 4597 course_id, 'Institutional' enrollment_type FROM dual)
-- end of mimicking your tables with data in them
SELECT res.student_id,
res.course_id,
CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
WHEN dummy.id = 2 THEN res.old_enrollment_type
END enrollment_type,
res.mark,
CASE WHEN dummy.id = 1 THEN 'ACTIVE'
WHEN dummy.id = 2 THEN 'HISTORY'
END status,
CASE WHEN dummy.id = 1 THEN res.new_version
WHEN dummy.id = 2 THEN res.old_version
END VERSION
FROM (SELECT ge.student_id,
ge.course_id,
ge.enrollment_type new_enrollment_type,
p.enrollment_type old_enrollment_type,
p.mark,
p.status,
p.version old_version,
nvl(p.version + 1, 0) new_VERSION
-- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
FROM group_enrollments ge
LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
AND ge.course_id = p.course_id
WHERE (p.status = 'ACTIVE' OR p.status IS NULL)
AND (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
INNER JOIN (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual) dummy ON dummy.id = 1
OR (dummy.id = 2
AND res.status = 'ACTIVE');
STUDENT_ID COURSE_ID ENROLLMENT_TYPE MARK STATUS VERSION
---------- ---------- --------------- ---------- ------- ----------
1234 5678 Mandatory2 70 ACTIVE 3
1234 9034 Optional ACTIVE 0
1234 5678 Mandatory 70 HISTORY 2
此查询首先查找全新的行(即group_enrollment表中不在性能表中有行的行)或具有不同的enrollment_type。这些是需要插入或更新的行。
一旦我们知道了,我们就可以加入一个虚拟的2行表格表,这样我们就可以一直加入第一个虚拟行,无论我们是否需要插入或更新,但我们都会这样做如果我们需要更新,则只加入第二个虚拟行。这意味着我们只有一行用于插入,但有两行用于更新。
然后根据dummy.id(第一个虚拟行的新值,第二个虚拟行的旧值)输出正确的值很容易。
一旦我们完成了这项工作,我们就知道需要将哪些数据合并到性能表中,所以现在merge语句看起来像:
merge into performance tgt
using (SELECT res.student_id,
res.course_id,
CASE WHEN dummy.id = 1 THEN res.new_enrollment_type
WHEN dummy.id = 2 THEN res.old_enrollment_type
END enrollment_type,
res.mark,
CASE WHEN dummy.id = 1 THEN 'ACTIVE'
WHEN dummy.id = 2 THEN 'HISTORY'
END status,
CASE WHEN dummy.id = 1 THEN res.new_version
WHEN dummy.id = 2 THEN res.old_version
END VERSION
FROM (SELECT ge.student_id,
ge.course_id,
ge.enrollment_type new_enrollment_type,
p.enrollment_type old_enrollment_type,
p.mark,
p.status,
p.version old_version,
nvl(p.version + 1, 0) new_VERSION
-- n.b. this may produce duplicates or unique constraint errors in a concurrent environment
FROM group_enrollments ge
LEFT OUTER JOIN PERFORMANCE p ON ge.student_id = p.student_id
AND ge.course_id = p.course_id
WHERE (p.status = 'ACTIVE' OR p.status IS NULL)
AND (p.enrollment_type != ge.enrollment_type OR p.enrollment_type IS NULL)) res
INNER JOIN (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual) dummy ON dummy.id = 1
OR (dummy.id = 2
AND res.status = 'ACTIVE')) src
ON (tgt.student_id = src.student_id AND tgt.course_id = src.course_id AND tgt.status = src.status)
WHEN MATCHED THEN
UPDATE SET tgt.enrollment_type = src.enrollment_type,
tgt.version = src.version
WHEN NOT MATCHED THEN
INSERT (tgt.student_id, tgt.course_id, tgt.enrollment_type, tgt.mark, tgt.status, tgt.version)
VALUES (src.student_id, src.course_id, src.enrollment_type, src.mark, src.status, src.version);
为了澄清,这里是一个非常简单的行条件重复示例(我们也可以将其称为部分交叉连接,因为一个表中的所有行都连接到另一个表中的至少一行):
WITH sample_data AS (SELECT 100 ID, NULL status FROM dual UNION ALL -- expect only one row
SELECT 101 ID, 'A' status FROM dual UNION ALL -- expect two rows
SELECT 102 ID, 'B' status FROM dual -- expect only one row
)
SELECT dummy.id dummy_row_id,
sd.id,
sd.status
FROM sample_data sd
INNER JOIN (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual) dummy ON dummy.id = 1
OR (dummy.id = 2
AND sd.status = 'A')
ORDER BY sd.id, dummy.id;
DUMMY_ROW_ID ID STATUS
------------ ---------- ------
1 100
1 101 A
2 101 A
1 102 B
你可以看到,对于sample_data" table"中的id = 101行,我们有两行,但其他两个id每行只有一行。
希望能为你澄清一些事情吗?