在Oracle中比较查询结果中的字段

时间:2019-04-01 18:11:08

标签: oracle

我正在从一个特定的记录的FIRSTNAME和LASTNAME相同但BIRTHDATE大于或等于15年的表中获取所有记录。

考虑我的桌子如下:

_______________________________________________________________________________
| PRIMARY_ID | UNIQUE_ID | FIRSTNAME | LASTNAME | SUFFIX | BIRTHDATE          |
_______________________________________________________________________________
| 12345      | abcd      | john      | collin   | Mr     | 1975-10-01 00:00:00|
| 12345      | cdef      | john      | collin   | Mr     | 1960-10-01 00:00:00|
| 12345      | efgh      | john      | collin   | Mr     | 1975-10-01 00:00:00|
| 12345      | ghij      | john      | collin   | Mr     | 1960-10-01 00:00:00|
| 12345      | aaaa      | john      | collin   | Mr     | 1975-10-01 00:00:00|
| 12345      | bdfs      | john      | collin   | Mr     | 1975-10-01 00:00:00|
| 12345      | asdf      | john      | collin   | Mr     | null               |
| 12345      | dfgh      | john      | collin   | Mr     | null               |
| 23456      | ghij      | jeremy    | lynch    | Mr     | 1982-10-15 00:00:00|
| 23456      | aaaa      | jacob     | lynch    | Mr     | 1945-10-12 00:00:00|
| 23456      | bdfs      | jeremy    | lynch    | Mr     | 1945-10-12 00:00:00|
| 23456      | asdf      | jacob     | lynch    | Mr     | null               |
| 23456      | dfgh      | jeremy    | lynch    | Mr     | null               |
_______________________________________________________________________________

在此表中,对于PRIMARY_ID 12345,FIRSTNAME和LASTNAME都是相同的,但UNIQUE_ID之间的BIRTHDATE差异为15年。因此,需要将这个PRIMARY_ID拔出。其中对于PRIMARY_ID 23456,对于所有UNIQUE_ID记录,FIRSTNAME都不相同,因此不得将其拔出。

该表可能包含BIRTHDATE的NULL值,应将其忽略。

这是我到目前为止尝试过的:

SELECT
  /*PARALLEL(16)*/
  PRIMARY_ID,
  UNIQUE_ID,
  FIRSTNAME,
  LASTNAME,
  SUFFIX,
  BIRTHDATE,
  RANK() OVER ( ORDER BY FIRSTNAME, LASTNAME, SUFFIX, BIRTHDATE) "GROUP"
FROM TABLE;

我已经询问要组成不同的小组,以便按FIRSTNAME,LASTNAME和BIRTHDATE进行区分。我不知道如何进一步进行此操作。

有人可以帮忙吗?

注意:BIRTHDATE字段为varchar数据类型,我使用Oracle 12C。

1 个答案:

答案 0 :(得分:1)

据我所知,目标是返回primary_id的不同集合,对于这些集合,unique_idfirstname相邻的(按字母顺序排列)lastname被分隔开超过15年。据我了解,NULL应该中断比较(并被认为是不匹配的(否则,对于伪相邻的bdfs + ghij,primary_id 23456也将匹配)。

还有其他方法可以执行此操作,但是12c中可用的一种方法是使用模式匹配。下面是一个示例。该示例仅使用5478天的差异来表示15年,但是如果在插层日等方面需要更大的精确度,则可能会产生细微差别。

SELECT DISTINCT PRIMARY_ID
FROM THE_TABLE
    MATCH_RECOGNIZE (
        PARTITION BY PRIMARY_ID
        ORDER BY UNIQUE_ID
        ONE ROW PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN(FIFTEEN_DIFF)
        DEFINE FIFTEEN_DIFF AS
            (FIFTEEN_DIFF.FIRSTNAME = PREV(FIFTEEN_DIFF.FIRSTNAME)
                AND FIFTEEN_DIFF.LASTNAME = PREV(FIFTEEN_DIFF.LASTNAME)
                AND (ABS(EXTRACT( DAY FROM (TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE,'YYYY-MM-DD HH24:MI:SS') - PREV(TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE,'YYYY-MM-DD HH24:MI:SS'))))) >= 5478)));

Result:

  PRIMARY_ID
       12345


1 row selected.

上面的查询执行以下操作:
PARTITION可以分别查看每个PRIMARY_ID组,

然后用ORDER替换UNIQUE_ID,因此仅比较按字母顺序相邻的记录。

然后将每条记录与最后一条记录进行比较,如果它们共享FIRSTNAMELASTNAME,并且它们的BIRTHDATE相差15年以上,则将它们计为{{1 }},并返回一条记录来表明这一点。

找到任何匹配项后,它将跳到下一行并继续比较。

由于只需要不同的匹配项,因此select语句中包含MATCH

编辑:

为回答后续问题,添加了两个其他示例。

替代1:预先过滤DISTINCT 这样会将不同的NULL带到附近,从而提供不同的匹配项。

UNIQUE_ID

结果(现在包括SELECT DISTINCT PRIMARY_ID FROM (SELECT PRIMARY_ID, UNIQUE_ID, FIRSTNAME, LASTNAME, SUFFIX, BIRTHDATE FROM THE_TABLE WHERE BIRTHDATE IS NOT NULL) MATCH_RECOGNIZE ( PARTITION BY PRIMARY_ID ORDER BY UNIQUE_ID ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (FIFTEEN_DIFF) DEFINE FIFTEEN_DIFF AS (FIFTEEN_DIFF.FIRSTNAME = PREV(FIFTEEN_DIFF.FIRSTNAME) AND FIFTEEN_DIFF.LASTNAME = PREV(FIFTEEN_DIFF.LASTNAME) AND (ABS(EXTRACT(DAY FROM (TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS') - PREV(TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS'))))) >= 5478))); 23456,因为删除NULL会使两个PRIMARY_ID进入顺序,彼此间隔15年以上):

UNIQUE_ID

替代2:将NULL视为匹配项

  PRIMARY_ID
       12345
       23456

2 rows selected.

结果(由于SELECT DISTINCT PRIMARY_ID FROM THE_TABLE MATCH_RECOGNIZE ( PARTITION BY PRIMARY_ID ORDER BY UNIQUE_ID ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (FIFTEEN_DIFF) DEFINE FIFTEEN_DIFF AS (FIFTEEN_DIFF.FIRSTNAME = PREV(FIFTEEN_DIFF.FIRSTNAME) AND FIFTEEN_DIFF.LASTNAME = PREV(FIFTEEN_DIFF.LASTNAME) AND ((ABS(EXTRACT(DAY FROM (TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS') - PREV(TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS'))))) >= 5478) OR (LEAST(FIFTEEN_DIFF.BIRTHDATE,PREV(FIFTEEN_DIFF.BIRTHDATE)) IS NULL AND COALESCE(FIFTEEN_DIFF.BIRTHDATE,PREV(FIFTEEN_DIFF.BIRTHDATE)) IS NOT NULL)))); 现在也算作匹配项,因此也会返回PRIMARY_ID

NULL