我有下表(商标):
firstname lastname Mark
------------------------------
arun prasanth 40
ann antony 45
sruthy abc 41
new abc 47
arun prasanth 45
arun prasanth 49
ann antony 49
并希望添加一列来标记,如果具有特定列的记录出现了多次。结果是:
firstname lastname Mark MULTI_FLAG
----------------------------------------------
arun prasanth 40 1
ann antony 45 1
sruthy abc 41 0
new abc 47 0
arun prasanth 45 1
arun prasanth 49 1
ann antony 49 1
我可以通过以下 GROUP BY 查询获得结果:
SELECT M1.firstname
,M1.lastname
,M1.Mark
,M2.MULTI_COUNT
FROM Marks M1
JOIN (SELECT firstname, lastname, CASE WHEN COUNT (*) > 1 THEN 1 ELSE 0 END AS MULTI_COUNT
FROM Marks
GROUP BY firstname, lastname) M2
ON M2.firstname = M1.firstname AND M2.lastname = M1.lastname;
或者通过更漂亮的 PARTITION BY 查询:
SELECT
firstname,
lastname,
CASE WHEN COUNT(*) OVER (PARTITION BY
firstname,
lastname) > 1 THEN 1 ELSE 0 END AS MULTI_FLAG
FROM
Marks
在返回的类似大表上运行 GROUP BY 查询: 34 m 56 s 595 ms
在返回的类似大表上运行 PARTITION BY 查询:
我想知道:
编辑: Oracle版本 Oracle Database 11g企业版11.2.0.3.0版-64位生产 PL / SQL版本11.2.0.3.0-生产 “ CORE 11.2.0.3.0生产” 适用于Linux的TNS:版本11.2.0.3.0-生产 NLSRTL版本11.2.0.3.0-生产
按计划划分
PLAN_TABLE_OUTPUT
Plan hash value: 3822227444
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 668K| 90M| | 90429 (1)| 00:18:06 |
| 1 | WINDOW SORT | | 668K| 90M| 98M| 90429 (1)| 00:18:06 |
|* 2 | HASH JOIN RIGHT OUTER | | 668K| 90M| | 69340 (1)| 00:13:53 |
| 3 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | | | | | |
| 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
| 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM"
AND "PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE
'INPUT%')
按计划分组
PLAN_TABLE_OUTPUT
Plan hash value: 648231064
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 912 | 2052K| | 226K (1)| 00:45:22 |
|* 1 | HASH JOIN | | 912 | 2052K| | 226K (1)| 00:45:22 |
| 2 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 89667 | 194M| 45M| 226K (1)| 00:45:22 |
| 4 | NESTED LOOPS | | | | | | |
| 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
| 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
| 9 | VIEW | | 668K| 1377M| | 86518 (1)| 00:17:19 |
| 10 | HASH GROUP BY | | 668K| 72M| 80M| 86518 (1)| 00:17:19 |
|* 11 | HASH JOIN RIGHT OUTER | | 668K| 72M| | 69340 (1)| 00:13:53 |
| 12 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 2478 | | 3 (0)| 00:00:01 |
| 13 | NESTED LOOPS | | | | | | |
| 14 | NESTED LOOPS | | 377K| 35M| | 69335 (1)| 00:13:53 |
| 15 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 16 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 17 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 1701 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("R2"."TRIAL_COUNTRY_CD"="CRM"."COUNTRY_CD" AND
UPPER("CRM"."COUNTRY")=UPPER("QCAB"."TRIAL_COUNTRY"))
3 - access("R2"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "R2"."ITERATION"="QCAB"."ITERATION" AND
"R2"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND "R2"."ASSUMPTION"="QCAB"."ASSUMPTION")
7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
"PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')
11 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
16 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
"PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')
答案 0 :(得分:2)
通常,您从分析功能count(*)
开始,该功能会生成紧凑的SQL。
此方法的缺点是必须对数据进行排序(请参见WINDOW SORT
操作)。 GROUP BY
方法避免
可以使用HASH GROUP BY
进行排序,这样可以提高性能。
您的示例更加复杂,因为您不使用表,而是连接三个表的视图-对于GROUP BY
和明细数据,该连接执行了两次。哪一个
当然不是最佳的。
因此,我将从查询的解析函数版本开始(可以使用PARALLEL
选项)。
如果您想尝试GROUP BY
,则可以使用lightway版本:
1)仅将重复的键分组
2)使OUTER JOIN
分配MULTI_FLAG
下面具有执行计划的示例-使用数据进行简单测试
with dups as (
select firstname,lastname from tmp
group by firstname,lastname
having count(*) > 1)
select tmp.FIRSTNAME, tmp.LASTNAME, tmp.MARK,
case when dups.firstname is not NULL then 1 else 0 end as MULTI_FLAG
from tmp
left outer join dups on tmp.firstname = dups.firstname and tmp.lastname = dups.lastname;
您仍然需要两次访问视图,但是最终联接会更快(尤其是如果您只有很少的重复键)。
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 26M| | 1673 (1)| 00:00:21 |
|* 1 | HASH JOIN RIGHT OUTER| | 105K| 26M| 11M| 1673 (1)| 00:00:21 |
| 2 | VIEW | | 105K| 10M| | 128 (4)| 00:00:02 |
|* 3 | FILTER | | | | | | |
| 4 | HASH GROUP BY | | 105K| 10M| | 128 (4)| 00:00:02 |
| 5 | TABLE ACCESS FULL| TMP | 105K| 10M| | 125 (1)| 00:00:02 |
| 6 | TABLE ACCESS FULL | TMP | 105K| 15M| | 125 (1)| 00:00:02 |
--------------------------------------------------------------------------------------