我还在学习oracle中的regexp我被困在中间,下面是我的示例代码:
> tbl(pg, sql("
+ SELECT *
+ FROM fdata
+ LEFT JOIN sdata
+ ON fyear >= byear AND fyear < eyear")) %>%
+ explain()
<SQL>
SELECT "id", "fyear", "byear", "eyear", "val"
FROM (
SELECT *
FROM fdata
LEFT JOIN sdata
ON fyear >= byear AND fyear < eyear) AS "zzz140"
<PLAN>
Nested Loop Left Join (cost=0.00..50886.88 rows=322722 width=40)
Join Filter: ((fdata.fyear >= sdata.byear) AND (fdata.fyear < sdata.eyear))
-> Seq Scan on fdata (cost=0.00..28.50 rows=1850 width=16)
-> Materialize (cost=0.00..33.55 rows=1570 width=24)
-> Seq Scan on sdata (cost=0.00..25.70 rows=1570 width=24)
提前致谢。
答案 0 :(得分:5)
你可以通过提取不同的捕获组(在()
括号中包围)而不需要双向反转来实现:
WITH t ( VAL ) AS (
SELECT 'my_new_table_2015_06_31' FROM DUAL UNION ALL
SELECT 'my_new_table_temp_2016_06_31' FROM DUAL
)
SELECT REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 1 ) AS COL4,
REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 2 ) AS COL3,
REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 3 ) AS COL2,
REGEXP_SUBSTR( val, '^(.*)_([^_]+)_([^_]+)_([^_]+)$', 1, 1, NULL, 4 ) AS COL1
FROM t
你甚至可以通过使用:
使正则表达式更简单'^(.+)_(.+)_(.+)_(.+)$'
第一个.+
是贪婪的,所以它会尽可能地匹配,直到第2到第4个捕获组的最小匹配只留下足够的字符串。
但是,您不需要正则表达式:
WITH t ( VAL ) AS (
SELECT 'my_new_table_2015_06_31' FROM DUAL UNION ALL
SELECT 'my_new_table_temp_2016_06_31' FROM DUAL
)
SELECT SUBSTR( val, 1, pos1 - 1 ) AS col4,
SUBSTR( val, pos1 + 1, pos2 - pos1 - 1 ) AS col3,
SUBSTR( val, pos2 + 1, pos3 - pos2 - 1 ) AS col2,
SUBSTR( val, pos3 + 1 ) AS col1
FROM (
SELECT val,
INSTR( val, '_', -1, 1 ) AS pos3,
INSTR( val, '_', -1, 2 ) AS pos2,
INSTR( val, '_', -1, 3 ) AS pos1
FROM t
);
答案 1 :(得分:0)
调整col4正则表达式以匹配字符串的其余部分。最后一个参数表示正则表达式匹配中的捕获组,第二个参数仅在语法上是必需的(它保存匹配参数)。
对cols 1-3的外部substr
调用删除了作为匹配的一部分的下划线。
with t(val)
as
(
--format: xyz_year_month_date
select 'my_new_table_2015_06_31' from dual
union all
select 'my_new_table_temp_2016_06_31' from dual
)
select reverse(regexp_substr(reverse(val),'([^_]+_){3}(.*)',1,1,'',2)) col4,
substr(reverse(regexp_substr(reverse(val),'[^_]+_',1,3)), 2) col3,
substr(reverse(regexp_substr(reverse(val),'[^_]+_',1,2)), 2) col2,
substr(reverse(regexp_substr(reverse(val),'[^_]+_',1,1)), 2) col1
from t;