正则表达式从逗号分隔的字符串中删除重复项

时间:2016-09-20 04:24:29

标签: regex oracle

我有以下字符串:

'C,2,1,2,3,1'

我需要一个正则表达式来删除重复项,结果字符串应该是这样的:

'C,2,1,3'

3 个答案:

答案 0 :(得分:1)

如果您的输入数据是多个字符串,我假设您可以使用某种id列来区分字符串。如果不存在此列,则可以在第一个因子子查询中创建它,例如使用rownum

with
     inputs ( id, str ) as (
       select 1, 'C,2,1,2,3,1'   from dual union all
       select 2, 'A,ZZ,3,A,3,ZZ' from dual
     ),
     unwrapped ( id, str, lvl, token ) as (
       select id, str, level, regexp_substr(str, '[^,]+', 1, level)
       from   inputs
       connect by level <= 1 + regexp_count(str, ',')
           and prior id = id
           and prior sys_guid() is not null
     ),
     with_rn ( id, str, lvl, token, rn ) as (
       select id, str, lvl, token, row_number() over (partition by id, token order by lvl)
       from   unwrapped
     )
select id, str, listagg(token, ',') within group (order by lvl) as new_str
from   with_rn
where  rn = 1
group by id, str
order by id
;


  ID STR                NEW_STR
---- ------------------ --------------------
   1 C,2,1,2,3,1        C,2,1,3
   2 A,ZZ,3,A,3,ZZ      A,ZZ,3

答案 1 :(得分:0)

试试这个:

with 
    -- your input data
    t_in as (select 'C,2,1,2,3,1' as s from dual),
    -- your string splitted into a table, a row per list item
    t_split as (
        select (regexp_substr(s,'(\w+)(,|$)',1,rownum,'c',1)) s, 
                level n
        from    t_in
        connect by level <= regexp_count(s,'(\w+)(,|$)') + 1
    ),
    -- this table grouped to obtain distinct values with 
    -- minimum levels for sorting
    t_grouped as (
        select  s, min(n) n from t_split group   by s  
    )
select listagg(s, ',') within group (order by n)
from   t_grouped;

根据您的Oracle版本,您可能需要将listagg替换为wm_concat(可以转发)

答案 2 :(得分:0)

这是另一个较短的解决方案:

select listagg(val, ',') within group(order by min(id))
  from (select rownum as id,
               trim(regexp_substr(str, '[^,]+', 1, level)) as val
          from (select 'C,2,1,2,3,1' as str from dual)
        connect by regexp_substr(str, '[^,]+', 1, level) is not null)
 group by val;