Oracle正则表达式从上次出现时拆分字符串

时间:2016-05-18 13:10:08

标签: sql regex oracle oracle11g

我还在学习oracle中的regexp我被困在中间,下面是我的示例代码:

> tbl(pg, sql("
+     SELECT *
+     FROM fdata 
+     LEFT JOIN sdata 
+     ON fyear >= byear AND fyear < eyear")) %>%
+     explain()
<SQL>
SELECT "id", "fyear", "byear", "eyear", "val"
FROM (
    SELECT *
    FROM fdata 
    LEFT JOIN sdata 
    ON fyear >= byear AND fyear < eyear) AS "zzz140"


<PLAN>
Nested Loop Left Join  (cost=0.00..50886.88 rows=322722 width=40)
  Join Filter: ((fdata.fyear >= sdata.byear) AND (fdata.fyear < sdata.eyear))
  ->  Seq Scan on fdata  (cost=0.00..28.50 rows=1850 width=16)
  ->  Materialize  (cost=0.00..33.55 rows=1570 width=24)
        ->  Seq Scan on sdata  (cost=0.00..25.70 rows=1570 width=24)

提前致谢。

2 个答案:

答案 0 :(得分:5)

你可以通过提取不同的捕获组(在()括号中包围)而不需要双向反转来实现:

WITH t ( VAL ) AS (
  SELECT 'my_new_table_2015_06_31' FROM DUAL UNION ALL
  SELECT 'my_new_table_temp_2016_06_31' FROM DUAL
)
SELECT REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 1 ) AS COL4,
       REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 2 ) AS COL3,
       REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 3 ) AS COL2,
       REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 4 ) AS COL1
FROM   t

你甚至可以通过使用:

使正则表达式更简单
'^(.+)_(.+)_(.+)_(.+)$'

第一个.+是贪婪的,所以它会尽可能地匹配,直到第2到第4个捕获组的最小匹配只留下足够的字符串。

但是,您不需要正则表达式

WITH t ( VAL ) AS (
  SELECT 'my_new_table_2015_06_31' FROM DUAL UNION ALL
  SELECT 'my_new_table_temp_2016_06_31' FROM DUAL
)
SELECT SUBSTR( val, 1,        pos1 - 1        ) AS col4,
       SUBSTR( val, pos1 + 1, pos2 - pos1 - 1 ) AS col3,
       SUBSTR( val, pos2 + 1, pos3 - pos2 - 1 ) AS col2,
       SUBSTR( val, pos3 + 1                  ) AS col1
FROM   (
  SELECT val,
         INSTR( val, '_', -1, 1 ) AS pos3,
         INSTR( val, '_', -1, 2 ) AS pos2,
         INSTR( val, '_', -1, 3 ) AS pos1
  FROM   t
);

答案 1 :(得分:0)

调整col4正则表达式以匹配字符串的其余部分。最后一个参数表示正则表达式匹配中的捕获组,第二个参数仅在语法上是必需的(它保存匹配参数)。

对cols 1-3的外部substr调用删除了作为匹配的一部分的下划线。

with t(val)
as
(
  --format: xyz_year_month_date
  select 'my_new_table_2015_06_31' from dual
  union all
  select 'my_new_table_temp_2016_06_31' from dual
 )
 select reverse(regexp_substr(reverse(val),'([^_]+_){3}(.*)',1,1,'',2)) col4,
 substr(reverse(regexp_substr(reverse(val),'[^_]+_',1,3)), 2) col3,
 substr(reverse(regexp_substr(reverse(val),'[^_]+_',1,2)), 2) col2,
 substr(reverse(regexp_substr(reverse(val),'[^_]+_',1,1)), 2) col1
 from t;