我有一个查询,根据我的数据库中的多个列(V1,V2,V3,V4)识别“潜在的”重复项,但它返回了许多匹配项,因此很难进行人工审核,所以我想分配根据以下内容对数据进行排名:
一个。如果V5,V6匹配 - 排名-1
湾如果V7,V8匹配 - 排名-2
等等。
除此之外,V1,V2,V3,V4将根据我当前的查询进行匹配。这是否可以使用dense_rank()?
我当前的查询是:
SELECT ID, V1, V2, V3, V4, CreatedDate
FROM (
SELECT T1.ID, V1, V2, V3, V4, CreatedDate,
COUNT(*)
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct,
COUNT( CASE CreatedDate WHEN DATE '2017-08-01' THEN 1 END )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_date_match
FROM T1
INNER JOIN T2
ON ( T1.ID = T2.ID )
INNER JOIN T3
ON ( T1.ID = T3.ID )
)
WHERE ct > 1
AND ct_date_match > 0
T1
ID | V1 | V2 | V5 | V6 | CreatedDate
---| --- | --- --- --- ----------
1 | A | US | 1984 | QR | 01-AUG-2017
2 | B | FR | 1991 | TY | 01-JAN-2017
3 | C | AU | 1989 | GH | 25-SEP-2017
4 | B | FR | 1995 | BN | 01-AUG-2017
5 | A | US | 1984 | QR | 30-MAR-2016
6 | C | AU | 1999 | MK | 14-JUN-2015
T2
ID | V3 | V7
---| --- ---
1 | Apple D12
1 | Kiwi S45
2 | Pear T23
3 | Banana U78
4 | Pear T23
5 | Apple D12
6 | Banana P90
T3
ID | V4 V8
---| --- ---
1 | Spinach A678
1 | Beets V902
2 | Celery T456
3 | Radish Y675
4 | Celery T456
5 | Spinach G890
6 | Celery F567
6 | Radish R453
当前输出:
1 A US Apple Spinach
5 A US Apple Spinach
2 B FR Pear Celery
4 B FR Pear Celery
预期输出: Rnk
1 A US Apple Spinach 1984 QR D12 A678 1
5 A US Apple Spinach 1984 QR D12 G890 1
2 B FR Pear Celery 1991 TY T23 T456 2
4 B FR Pear Celery 1995 BN T23 T456 2
答案 0 :(得分:0)
我并不完全确定,但看起来您可以将不同V5 / 6/7/8值的分区计数添加到内部查询中,然后在外部查询中评估它们:
SELECT ID, V1, V2, V3, V4, CreatedDate,
CASE WHEN ct_v5 = 1 AND ct_v6 = 1 THEN 1
WHEN ct_v7 = 1 AND ct_v8 = 1 THEN 2
END AS rnk
FROM (
SELECT T1.ID, V1, V2, V3, V4, CreatedDate,
COUNT(*)
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct,
COUNT( CASE CreatedDate WHEN DATE '2017-08-01' THEN 1 END )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_date_match,
COUNT( DISTINCT V5 )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v5,
COUNT( DISTINCT V6 )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v6,
COUNT( DISTINCT V7 )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v7,
COUNT( DISTINCT V8 )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v8
FROM T1
INNER JOIN T2
ON ( T1.ID = T2.ID )
INNER JOIN T3
ON ( T1.ID = T3.ID )
)
WHERE ct > 1
AND ct_date_match > 0;
您的数据包含哪些内容:
ID V V2 V3 V4 CREATEDDAT RNK
---------- - -- ------ ------- ---------- ----------
1 A US Apple Spinach 2017-08-01 1
5 A US Apple Spinach 2016-03-30 1
2 B FR Pear Celery 2017-01-01 2
4 B FR Pear Celery 2017-08-01 2
这对US / Kiwi / Beets没有一行,但样本数据似乎只有一组,原始查询也没有报告。