使用oracle中的regex提取逗号分隔值

时间:2013-12-13 07:29:47

标签: regex oracle

我有以下输入

Str := "Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)"

必须输出

AB123,MN456,xy789

我在oracle中使用以下正则表达式

SELECT TRIM (
          REGEXP_SUBSTR (
             'Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)',
             '[[:alpha:]]{2}[[:digit:]]{3}',
             1,
             1,
             'i'))
  FROM DUAL;

仅返回值AB123我希望所有逗号分隔。

请帮忙

提前致谢。

6 个答案:

答案 0 :(得分:2)

如此复杂的答案......

还有一个更简单的方法:

select rtrim(regexp_replace('Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)',
                            '([^\(]+?\(([[:alpha:]]{2}[[:digit:]]{3})\))','\2,',1,0,'i'),',')
from dual;

希望这有帮助。

修改 有点改变版本:

select rtrim(regexp_replace('Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)',
                            '[^\(]+?\(([[:alpha:]]{2}[[:digit:]]{3})\)','\1,',1,0,'i'),',')
from dual;

答案 1 :(得分:2)

我会这样做,尝试使用Oracle 10.2:

SELECT regexp_replace
       (
        'Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)'
       ,' ?\w+ \w+ ?\(([^)]+)\)'
       ,'\1'
       ) as col
  FROM dual;

答案 2 :(得分:1)

SQL Fiddle

查询1

这是使用正则表达式替换的方法:

(和一些边缘案例来测试 - NULL姓氏,后缀添加到姓氏和双管姓氏)

WITH strings AS (
            SELECT 'Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)' AS str FROM   DUAL
  UNION ALL SELECT 'Madonna  (MA001), John Jones(Jr) (JJ001), Doctor Doctor(PhD) (dd001), Alf Double-Barrelled (AD001)' AS str FROM   DUAL
)
SELECT REGEXP_REPLACE( str, '.*?\(([[:alpha:]]{2}[[:digit:]]{3})\)\s*(,|$)', '\1\2' ) AS match
FROM   strings

<强> Results

|                   MATCH |
|-------------------------|
|       AB123,MN456,xy789 |
| MA001,JJ001,dd001,AD001 |

查询2

这是使用分层查询的方法:

WITH str AS (
  SELECT 'Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)' AS str
  FROM   DUAL
),
lengths AS (
  SELECT str,
         REGEXP_COUNT( str, '\(([[:alpha:]]{2}[[:digit:]]{3})\)\s*(,|$)' ) AS len
  FROM   str
)
SELECT SUBSTR(
         SYS_CONNECT_BY_PATH (
           REGEXP_SUBSTR (
               str,
               '\(([[:alpha:]]{2}[[:digit:]]{3})\)\s*(,|$)',
               1,
               LEVEL,
               NULL,
               1
           ),
           ','
         ),
         2
       ) AS match
FROM lengths
WHERE LEVEL = len
CONNECT BY LEVEL <= len

<强> Results

|             MATCH |
|-------------------|
| AB123,MN456,xy789 |

查询3

如果您使用的是早于REGEXP_COUNT的Oracle版本,那么您可以使用LENGTHREGEXP_REPLACE的组合;像这样:

WITH str AS (
  SELECT 'Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)' AS str
  FROM   DUAL
)
SELECT str,
       REGEXP_COUNT( str, '\(([[:alpha:]]{2}[[:digit:]]{3})\)\s*(,|$)' ) AS len,
       LENGTH( REGEXP_REPLACE( str, '.*?\(([[:alpha:]]{2}[[:digit:]]{3})\)\s*(,|$)', 'X' )) AS len2
FROM   str

<强> Results

|                                                                   STR | LEN | LEN2 |
|-----------------------------------------------------------------------|-----|------|
| Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789) |   3 |    3 |

答案 3 :(得分:0)

这里是正则表达式,使用捕获组1来获取parens的内容:

\((.{5})\)

请在此处查看:http://regexr.com?37kq7

答案 4 :(得分:0)

确实有点丑陋,仅适用于Oracle版本&gt; = 11.2(因为LISTAGG从那时开始引入):

SELECT LISTAGG(COL1, ',') WITHIN GROUP(ORDER BY 1) RESULT
  FROM (SELECT TRIM(REGEXP_SUBSTR('Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)',
                                  '[[:alpha:]]{2}[[:digit:]]{3}',
                                  1,
                                  ROWNUM,
                                  'i')) COL1
          FROM DUAL
        CONNECT BY LEVEL <= REGEXP_COUNT('Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)',
                                         '[[:alpha:]]{2}[[:digit:]]{3}',
                                         1,
                                         'i'));

RESULT
--------------------------------------------------------------------------------
AB123,MN456,xy789

注意:以上情况适用于输入字符串中任何模式的出现。

更新:对于版本9i,10g,11.1,您可以使用Tom Kyte提供的STRAGG user function。同样正如评论中提到的那样,还有WM_CONCAT函数。

答案 5 :(得分:0)

不是很好,但很有效。

SELECT REGEXP_REPLACE 
('Name1 Surname1 (AB123), Name2 Surname2 (MN456), Name3 Surname3(xy789)',
'^.*?(\([^)]*?\)).*?(\([^)]*?\)).*?(\([^)]*?\))','\1,\2,\3')
FROM DUAL;