我想从列中获取唯一的正则表达式匹配计数。 (我使用的是Oracle DB)
For ex:
table A
Col1 Col2
1 test test test XXXX-1234 test XXXX-3456
2 test note note note XXXX-65577 test XXXX-1234
3 test note note note XXXX-9999 test test note
我需要将结果设为4
[此处,唯一编号为1234, 3456, 65577, 9999
]
我尝试使用select sum(regexp_count(Col2, 'XXXX:[0-9]')) from table A
,但它的计数为6
,这也包括重复的数字。
请提出解决方案。感谢。
答案 0 :(得分:2)
不使用正则表达式:
CREATE TABLE A ( Col1, Col2 ) AS
SELECT 1, 'test test test XXXX-1234 test XXXX-3456' FROM DUAL UNION ALL
SELECT 2, 'test note note note XXXX-65577 test XXXX-1234' FROM DUAL UNION ALL
SELECT 3, 'test note note note XXXX-9999 test test note' FROM DUAL;
<强>查询强>:
WITH start_pos ( Col2, start_pos ) AS (
SELECT Col2, INSTR( Col2, 'XXXX-' ) + 5
FROM A
WHERE INSTR( Col2, 'XXXX-' ) > 0
UNION ALL
SELECT Col2, INSTR( Col2, 'XXXX-', start_pos ) + 5
FROM start_pos
WHERE INSTR( Col2, 'XXXX-', start_pos ) > 0
),
end_pos ( Col2, start_pos, end_pos ) AS (
SELECT Col2, start_pos, INSTR( Col2, ' ', start_pos )
FROM start_pos
)
SELECT COUNT( DISTINCT
CASE end_pos
WHEN 0 THEN SUBSTR( Col2, start_pos )
ELSE SUBSTR( Col2, start_pos, end_pos - start_pos )
END
) AS number_of_unique_values
FROM end_pos;
<强>输出强>:
NUMBER_OF_UNIQUE_VALUES
---------------------------------------
4
答案 1 :(得分:1)
首先,您需要识别每行中的所有数字片段。我在子查询中使用标准方法。然后,为最终答案做count (distinct ....)
是微不足道的。
with table_a ( col1, col2 ) as (
select 1, 'test test test XXXX-1234 test XXXX-3456' from dual union all
select 2, 'test note note note XXXX-65577 test XXXX-1234' from dual union all
select 3, 'test note note note XXXX-9999 test XXXX-1234' from dual
)
-- End of SIMULATED data (not part of the solution!) SQL query begins BELOW THIS LINE.
select count (distinct nbr) as distinct_nbr_count
from ( select regexp_substr(col2, '\d+', 1, level) as nbr
from table_a
connect by regexp_substr(col2, '\d+', 1, level) is not null
and prior col1 = col1
and prior sys_guid() is not null
)
;
DISTINCT_NBR_COUNT
------------------
4