PL / SQL按模式拆分字符串

时间:2018-05-31 21:29:13

标签: regex oracle plsql regexp-substr

与此问题相似......

How can I use regex to split a string, using a string as a delimiter?

...我正在尝试拆分以下字符串:

Spent 30 CAD in movie tickets at Cineplex on 2018-06-01

我想要的输出是:

ELEMENT ELEMENT_VALUE
------- -------------
      1 Spent
      2 30
      3 CAD
      4 movie tickets
      5 Cineplex
      6 2018-06-01

同样,它应该能够处理:

Paid 600 EUR to Electric Company

产:

ELEMENT ELEMENT_VALUE
------- -------------
      1 Paid
      2 600
      3 EUR
      4 
      5 Electric Company

我试过这个正则表达式无济于事:

(\w+)(\D+)(\w+)(?(?=in)(\w+)(at)(\w+)(on)(.?$)|((?=to)(\w+)(.?$)))

我看了几个正则表达式网站加上这篇文章没有多少运气:

Extract some part of text separated by delimiter using regex

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

这是一个打破空间的简单SQL标记生成器:

select regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01','[^ ]+', 1, level) from dual
connect by regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01', '[^ ]+', 1, level) is not null

来自:https://blogs.oracle.com/aramamoo/how-to-split-comma-separated-string-and-pass-to-in-clause-of-select-statement

答案 1 :(得分:0)

您需要的输出有两个问题。第一个是如何定义要排除的标记('on','at'等)。第二个是如何忽略某些标记中的空间(“电子公司”,“电影票”)。

用两步法解决第一点很容易。步骤#1将字符串拆分为空格,步骤#2删除不需要的标记:

with exclude as (
  select 'in' as tkn from dual union all
  select 'at' as tkn from dual union all
  select 'to' as tkn from dual union all
  select 'on' as tkn from dual 
  )
  , str as (
    select id
           , level as element_order
           , regexp_substr(txt, '[^ ]+', 1, level) as tkn
    from t23
    where id = 10
    CONNECT BY level <= regexp_count(txt, '[^ ]+')+1
    and id = prior id
    and prior sys_guid() is not null
    )
 select row_number() over (partition by str.id order by str.element_order) as element
       , str.tkn as element_value
 from str
      left join exclude on exclude.tkn = str.tkn
 where exclude.tkn is null
 and str.tkn is not null
 ;

这是a SQL Fiddle demo

第二点很难解决。我想你需要另一个查找表来识别铃声,并且可能使用listagg()来连接它们。