我正在识别基于多列的重复项,但我发现有些记录没有所有数据是我重复的标准 - 如dob,age,gender。所以我想通过dob进行分区,但是如果它是null或者不匹配,则按年龄划分,如果它为null或不匹配,则按性别划分。这可能吗?
SELECT ID, V1, V2, V3, V4, CreatedDate
FROM (
SELECT T1.ID, V1, V2, V3, V4, CreatedDate,
COUNT(*)
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct,
COUNT( CASE CreatedDate WHEN DATE '2017-08-01' THEN 1 END )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_date_match
FROM T1
INNER JOIN T2
ON ( T1.ID = T2.ID )
INNER JOIN T3
ON ( T1.ID = T3.ID )
)
WHERE ct > 1
AND ct_date_match > 0
如果我修改我的分区语句,它将起作用吗?
(PARTITION BY V1, V2, V3, V4
(case when dob is null then age end),
(case when age is null then gender_id end))
答案 0 :(得分:0)
@mathguy是对的,如果你刚试过它,你本可以节省一些时间。
它将工作,使用coalesce函数并确保所有coalesce函数参数具有相同的类型。以下是使用int,varchar,date和float的示例:
drop table deleteme_tbl;
create table deleteme_tbl ( a int not null, b varchar2(5) , c date, d float(6), e varchar2(20));
insert into deleteme_tbl(a,b,c,d,e) values( 1, 'B', date '2017-12-01', 1.55, 'First Record');
insert into deleteme_tbl(a,b,c,d,e) values( 2, null, date '2017-12-02', 2.55, 'Second Record');
insert into deleteme_tbl(a,b,c,d,e) values( 3, 'B', null, 1.55, 'Third Record');
insert into deleteme_tbl(a,b,c,d,e) values( 4, 'B',null, null, 'Fourth Record');
insert into deleteme_tbl(a,b,c,d,e) values( 5, 'B', date '2017-12-01', 1.55, 'Fifth Record');
commit;
SELECT a.*
, COUNT (*)
OVER (PARTITION BY COALESCE (
TO_CHAR (a)
, b
, TO_CHAR (c, 'YYYYMMDD')
, TO_CHAR (d)
))
cnt
FROM deleteme_tbl a;
这导致:
A B C D E CNT
1 B 12/1/2017 1.6 First Record 1
2 12/2/2017 2.6 Second Record 1
3 B 1.6 Third Record 1
4 B Fourth Record 1
5 B 12/1/2017 1.6 Fifth Record 1
答案 1 :(得分:0)
不要使用不同的case语句,而是将COALESCE(dob,age,gender)放在DOB的位置,它应该可以工作。确保将它包含在查询输出中,以防你想看到它们并比较是否它正是你需要的