如何从列中获取唯一的正则表达式计数 - Oracle

时间:2017-06-21 18:06:30

标签: oracle

我想从列中获取唯一的正则表达式匹配计数。 (我使用的是Oracle DB)

For ex: 

table A 
Col1 Col2
1    test test test XXXX-1234 test XXXX-3456
2    test note note note XXXX-65577 test XXXX-1234
3    test note note note XXXX-9999 test test note

我需要将结果设为4 [此处,唯一编号为1234, 3456, 65577, 9999]

我尝试使用select sum(regexp_count(Col2, 'XXXX:[0-9]')) from table A,但它的计数为6,这也包括重复的数字。

请提出解决方案。感谢。

2 个答案:

答案 0 :(得分:2)

不使用正则表达式:

CREATE TABLE A ( Col1, Col2 ) AS
  SELECT 1, 'test test test XXXX-1234 test XXXX-3456' FROM DUAL UNION ALL
  SELECT 2, 'test note note note XXXX-65577 test XXXX-1234' FROM DUAL UNION ALL
  SELECT 3, 'test note note note XXXX-9999 test test note' FROM DUAL;

<强>查询

WITH start_pos ( Col2, start_pos ) AS (
  SELECT Col2, INSTR( Col2, 'XXXX-' ) + 5
  FROM   A
  WHERE  INSTR( Col2, 'XXXX-' ) > 0
UNION ALL
  SELECT Col2, INSTR( Col2, 'XXXX-', start_pos ) + 5
  FROM   start_pos
  WHERE  INSTR( Col2, 'XXXX-', start_pos ) > 0
),
end_pos ( Col2, start_pos, end_pos ) AS (
  SELECT Col2, start_pos, INSTR( Col2, ' ', start_pos )
  FROM   start_pos
)
SELECT COUNT( DISTINCT
         CASE end_pos
         WHEN 0 THEN SUBSTR( Col2, start_pos )
         ELSE SUBSTR( Col2, start_pos, end_pos - start_pos )
         END
       ) AS number_of_unique_values
FROM   end_pos;

<强>输出

                NUMBER_OF_UNIQUE_VALUES
---------------------------------------
                                      4

答案 1 :(得分:1)

首先,您需要识别每行中的所有数字片段。我在子查询中使用标准方法。然后,为最终答案做count (distinct ....)是微不足道的。

with table_a ( col1, col2 ) as (
       select 1, 'test test test XXXX-1234 test XXXX-3456'       from dual union all
       select 2, 'test note note note XXXX-65577 test XXXX-1234' from dual union all
       select 3, 'test note note note XXXX-9999 test XXXX-1234'  from dual
     )
-- End of SIMULATED data (not part of the solution!) SQL query begins BELOW THIS LINE.
select count (distinct nbr) as distinct_nbr_count
from   ( select regexp_substr(col2, '\d+', 1, level) as nbr
         from   table_a
         connect by regexp_substr(col2, '\d+', 1, level) is not null
             and prior col1 = col1
             and prior sys_guid() is not null
       )
;

DISTINCT_NBR_COUNT
------------------
4