我有一个复杂的oracle视图,它返回在返回的行中具有逻辑副本的数据。我的目标是在基于两列(文本和日期时间)找到重复项时仅检索一行,但是要确定要返回哪一个重复项将基于第三列(日期时间)。
我已将下面的结果集模拟到带有存根数据的表中(在SQLFiddle上找到here):
CREATE TABLE TimeTable (
ID number NOT NULL,
NAME VARCHAR2(20) NOT NULL, -- Grouped by this first
TARGETVALUE INT, -- ultimate target value to be returned (no precedence from this value)
NOTE VARCHAR2(20) NULL, -- Just a note for the developer on StackOverflow
BEGIN_DATE TIMESTAMP NOT NULL, -- Grouped by this 2nd (down to the minute, not seconds)
APPROVAL_DATE TIMESTAMP NOT NULL -- Decides the ties for duplicates
);
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(1, 'Alpha', 5, 'Duplicate First', '08-MAR-14 09.43.00.000000000',
'09-MAR-14 09.43.00.000000000');
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(2, 'Alpha', 2, 'Duplicate Middle', '08-MAR-14 09.43.00.000000000',
'09-MAR-14 09.43.00.000000000');
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(3, 'Alpha', 3, 'Final Target', '08-MAR-14 09.43.00.000000000',
'09-MAR-14 10.00.00.000000000');
-- Same time as alpha, but not related.
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(4, 'Beta', 4, 'Only Target', '08-MAR-14 09.43.30.000000000',
'09-MAR-14 11.00.30.000000000');
需要的结果集是2行
3, 'Alpha', 3, '08-MAR-14 09.43.00.000000000', '09-MAR-14 10.00.00.000000000'
4, 'Beta', 4, '08-MAR-14 09.43.30.000000000' '09-MAR-14 11.00.30.000000000'
如果我在数据库中有这个值,请注意澄清
5, 'Alpha', 8, '09-MAR-14 09.43.00.000000000', '12-MAR-14 10.00.00.000000000'
然后该Alpha集将是唯一的并且也返回,因为由于不同的BEGIN_DATE
(即3月9日而不是8日),它不被视为重复。
以下是遵循的规则
NAME
与数据相关。BEGIN_DATE
是第二种关系,其中直到分钟的确切时间将有重复,需要将其淘汰以基于#3。 APPROVAL_DATE
确定删除它们,这些<{1}}将在之前的日期赢取。 答案 0 :(得分:2)
根据所提到的规则聚合数据应该是ANALYTICS
的简单实现。
您希望每组MAX
中的APPROVAL DATE
NAME, BEGIN_DATE
。所以,你需要做的就是:
MAX(APPROVAL_DATE) OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) max_appr_dt
并且,在您的外部查询中,只需使用DUPLICATES
中的WHERE APPROVAL_DATE = max_aapr_dt
过滤掉PREDICATE
。
注意强>
从PERFORMANCE
的角度来看,此方法仅执行一次TABLE SCAN
。因此,比加入表格和进行多表扫描
更新按照评论中的要求添加完整的测试用例
使用分析有两种方法:
<强> 1.MAX 强>
SQL> SELECT *
2 FROM
3 (SELECT A.*,
4 MAX(APPROVAL_DATE) OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) max_appr_dt
5 FROM TIMETABLE A
6 )
7 WHERE approval_date = max_appr_dt
8 /
ID NAME TARGETVALUE NOTE BEGIN_DATE APPROVAL_DATE MAX_APPR_DT
---------- -------------------- ----------- -------------------- ------------------------------ ------------------------------ ------------------------------
3 Alpha 3 Final Target 08-MAR-14 09.43.00.000000 AM 09-MAR-14 10.00.00.000000 AM 09-MAR-14 10.00.00.000000 AM
4 Beta 4 Only Target 08-MAR-14 09.43.30.000000 AM 09-MAR-14 11.00.30.000000 AM 09-MAR-14 11.00.30.000000 AM
<强> 2.ROW_NUMBER()强>
SQL> SELECT *
2 FROM
3 (SELECT a.*,
4 row_number() OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) AS "RNK"
5 FROM TIMETABLE A
6 )
7 WHERE rnk =1
8 /
ID NAME TARGETVALUE NOTE BEGIN_DATE APPROVAL_DATE RNK
---------- -------------------- ----------- -------------------- ------------------------------ ------------------------------ ----------
3 Alpha 3 Final Target 08-MAR-14 09.43.00.000000 AM 09-MAR-14 10.00.00.000000 AM 1
4 Beta 4 Only Target 08-MAR-14 09.43.30.000000 AM 09-MAR-14 11.00.30.000000 AM 1
两个查询的执行计划:
SQL> EXPLAIN PLAN FOR
2 SELECT *
3 FROM
4 (SELECT A.*,
5 MAX(APPROVAL_DATE) OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) max_appr_dt
6 FROM TIMETABLE A
7 )
8 WHERE approval_date = max_appr_dt
9 /
Explained.
SQL>
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2691156688
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 356 | 3 (0)| 00:00:01 |
|* 1 | VIEW | | 4 | 356 | 3 (0)| 00:00:01 |
| 2 | WINDOW SORT | | 4 | 304 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TIMETABLE | 4 | 304 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("APPROVAL_DATE"="MAX_APPR_DT")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
19 rows selected.
SQL>
SQL> EXPLAIN PLAN FOR
2 SELECT *
3 FROM
4 (SELECT a.*,
5 row_number() OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) AS "RNK"
6 FROM TIMETABLE A
7 )
8 WHERE rnk =1
9 /
Explained.
SQL>
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3768566268
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 356 | 3 (0)| 00:00:01 |
|* 1 | VIEW | | 4 | 356 | 3 (0)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 4 | 304 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TIMETABLE | 4 | 304 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNK"=1)
2 - filter(ROW_NUMBER() OVER ( PARTITION BY "NAME","BEGIN_DATE" ORDER BY
INTERNAL_FUNCTION("APPROVAL_DATE") DESC )<=1)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
21 rows selected.
答案 1 :(得分:0)
我知道您使用的是Oracle DB。但是,我使用SQL服务器测试了这个。 SQL应该适用于所有DB。尝试我的查询。我不确定这是否是最有效的方法。如果这有帮助,请告诉我。
select t.ID, t.name, t.targetvalue, t.begin_date, t.approval_date
from
(
select name, begin_date, max(approval_date) as approval_date
from timetable
group by name, begin_date
) as mx
inner join timetable as t
on mx.name = t.name and
mx.begin_date = t.begin_date and
mx.approval_date = t.approval_date
额外查询 - 如果要在SQL Server中的问题中创建表 -
CREATE TABLE TimeTable (
ID int NOT NULL,
NAME VARCHAR(20) NOT NULL,
TARGETVALUE INT,
NOTE VARCHAR(20) NULL,
BEGIN_DATE datetime NOT NULL,
APPROVAL_DATE datetime NOT NULL
);
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(1, 'Alpha', 5, 'Duplicate First', '08-03-14 09:43:00',
'09-03-14 09:43:00');
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(2, 'Alpha', 2, 'Duplicate Middle', '08-03-14 09:43:00',
'09-03-14 09:43:00');
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(3, 'Alpha', 3, 'Final Target', '08-03-14 09:43:00',
'09-03-14 10:00:00');
-- Same time as alpha, but not related:
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values
(4, 'Beta', 4, 'Only Target', '08-03-14 09:43:30',
'09-03-14 11:00:30');