基于匹配的数据排名

时间:2017-12-05 17:56:18

标签: sql oracle

我有一个查询,根据我的数据库中的多个列(V1,V2,V3,V4)识别“潜在的”重复项,但它返回了许多匹配项,因此很难进行人工审核,所以我想分配根据以下内容对数据进行排名:
一个。如果V5,V6匹配 - 排名-1
湾如果V7,V8匹配 - 排名-2
等等。

除此之外,V1,V2,V3,V4将根据我当前的查询进行匹配。这是否可以使用dense_rank()?

我当前的查询是:

SELECT ID, V1, V2, V3, V4, CreatedDate
FROM   (
  SELECT T1.ID, V1, V2, V3, V4, CreatedDate,
         COUNT(*)
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct,
         COUNT( CASE CreatedDate WHEN DATE '2017-08-01' THEN 1 END )
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_date_match
  FROM   T1
         INNER JOIN T2
         ON ( T1.ID = T2.ID )
         INNER JOIN T3
         ON ( T1.ID = T3.ID )
)
WHERE  ct > 1
AND    ct_date_match > 0

T1

ID |   V1 |  V2 |  V5   | V6 | CreatedDate   
---| ---  | ---    ---   --- ----------  
1  |   A  |  US |  1984 | QR | 01-AUG-2017  
2  |   B  |  FR |  1991 | TY | 01-JAN-2017  
3  |   C  |  AU |  1989 | GH | 25-SEP-2017  
4  |   B  |  FR |  1995 | BN | 01-AUG-2017  
5  |   A  |  US |  1984 | QR | 30-MAR-2016  
6  |   C  |  AU |  1999 | MK | 14-JUN-2015

T2

ID | V3    | V7    
---| ---    ---                
1  | Apple   D12    
1  | Kiwi    S45    
2  | Pear    T23           
3  | Banana  U78           
4  | Pear    T23    
5  | Apple   D12    
6  | Banana  P90

T3

ID | V4      V8    
---|  ---    ---    
1  | Spinach A678    
1  | Beets   V902    
2  | Celery  T456    
3  | Radish  Y675    
4  | Celery  T456    
5  | Spinach G890     
6  | Celery  F567    
6  | Radish  R453

当前输出:

1 A US Apple Spinach  
5 A US Apple Spinach  
2 B FR Pear  Celery  
4 B FR Pear  Celery

预期输出: Rnk

1 A US Apple Spinach 1984 QR D12 A678 1    
5 A US Apple Spinach 1984 QR D12 G890 1    
2 B FR Pear  Celery  1991 TY T23 T456 2     
4 B FR Pear  Celery  1995 BN T23 T456 2 

1 个答案:

答案 0 :(得分:0)

我并不完全确定,但看起来您可以将不同V5 / 6/7/8值的分区计数添加到内部查询中,然后在外部查询中评估它们:

SELECT ID, V1, V2, V3, V4, CreatedDate,
  CASE WHEN ct_v5 = 1 AND ct_v6 = 1 THEN 1
       WHEN ct_v7 = 1 AND ct_v8 = 1 THEN 2
  END AS rnk
FROM   (
  SELECT T1.ID, V1, V2, V3, V4, CreatedDate,
         COUNT(*)
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct,
         COUNT( CASE CreatedDate WHEN DATE '2017-08-01' THEN 1 END )
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_date_match,
         COUNT( DISTINCT V5 )
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v5,
         COUNT( DISTINCT V6 )
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v6,
         COUNT( DISTINCT V7 )
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v7,
         COUNT( DISTINCT V8 )
           OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_v8
  FROM   T1
         INNER JOIN T2
         ON ( T1.ID = T2.ID )
         INNER JOIN T3
         ON ( T1.ID = T3.ID )
)
WHERE  ct > 1
AND    ct_date_match > 0;

您的数据包含哪些内容:

        ID V V2 V3     V4      CREATEDDAT        RNK
---------- - -- ------ ------- ---------- ----------
         1 A US Apple  Spinach 2017-08-01          1
         5 A US Apple  Spinach 2016-03-30          1
         2 B FR Pear   Celery  2017-01-01          2
         4 B FR Pear   Celery  2017-08-01          2

这对US / Kiwi / Beets没有一行,但样本数据似乎只有一组,原始查询也没有报告。