正则表达式聚合捕获组

时间:2017-08-06 09:32:49

标签: sql regex oracle aggregation capturing-group

我有http://sqlfiddle.com/#!4/ecba5/4的数据样本:

with raw_data as(select
'a{ thing_a<1234:1.1>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<F>>thing_y<F>>thing_z<F>>#thing_a<234:1.2>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<T>>thing_y<F>>thing_z<F>>#thing_a<345:1.3>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<T>>thing_y<F>>thing_z<F>>#thing_a<456:1.4>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<T>>thing_y<F>>thing_z<F>>#};ADD{some_thing<1234>>some_id<null>>some_stuff<2>>some_date<2013-07-09+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#some_thing<234>>some_id<null>>some_stuff<2>>some_date<2013-10-21+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#some_thing<345>>some_id<null>>some_stuff<2>>some_date<2013-07-22+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#some_thing<456>>some_id<null>>some_stuff<1>>some_date<2014-03-31+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#}]' value
from dual)
select 
regexp_count(value, 'thing_a<\d*:\d\.\d*>') thing_a, -- count all occurences of thing_a
regexp_replace(value, 'thing_b<(-)>', '0'), -- How to only replace capturing group?
--regexp_substr(regexp_replace(value, 'thing_b<(-)>', '0'),'thing_b<(.)>', 1, 1, NULL, 1), -- assuming this works - how to sum it up? there is no regexp_sum to aggregate the capturing groups?
value
from raw_data;

有些汇总很容易,例如thing_a regexp_count工作正常。但是,对于thing_b之类的所有其他人,首先将-替换为NaN,将F替换为0,将T替换为1应该为每个捕获组进行。 然后我想总结一下。但似乎不存在regexp_sum

所需的输出类似于:

thing_a,thing_b,thing_d,...,thing_x
4,0,0,...,3

1 个答案:

答案 0 :(得分:1)

您可以通过将整个匹配替换为删除已捕获组的重新格式化版本来完成此操作:

regexp_replace(value, '(thing_b)<(-)>', '\1<0>')

可以使用Oracle的内置SUM()来总结这些值:

sum(to_number(regexp_substr(regexp_replace(value, '(thing_b)<(-)>', '\1<0>'),'thing_b<(.)>', 1, 1, NULL, 1)))

完整查询:

with raw_data as(select
  'a{ thing_a<1234:1.1>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<F>>thing_y<F>>thing_z<F>>#thing_a<234:1.2>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<T>>thing_y<F>>thing_z<F>>#thing_a<345:1.3>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<T>>thing_y<F>>thing_z<F>>#thing_a<456:1.4>>thing_b<->>thing_c<T>>thing_d<F>>thing_f<F>>thing_g<F>>thing_h<F>>thing_i<F>>thing_x<T>>thing_y<F>>thing_z<F>>#};ADD{some_thing<1234>>some_id<null>>some_stuff<2>>some_date<2013-07-09+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#some_thing<234>>some_id<null>>some_stuff<2>>some_date<2013-10-21+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#some_thing<345>>some_id<null>>some_stuff<2>>some_date<2013-07-22+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#some_thing<456>>some_id<null>>some_stuff<1>>some_date<2014-03-31+02:00>>thing_zz<1>>foo_bar<0>>status_foo<0>>bar_value<0>>#}]' value
  from dual)
select 
regexp_count(value, 'thing_a<\d*:\d\.\d*>') thing_a, -- count all occurences of thing_a
regexp_replace(value, '(thing_b)<(-)>', '\1<0>'), -- How to only replace capturing group?
sum(to_number(regexp_substr(regexp_replace(value, '(thing_b)<(-)>', '\1<0>'),'thing_b<(.)>', 1, 1, NULL, 1))) -- assuming this works - how to sum it up? there is no regexp_sum to aggregate the capturing groups?
from raw_data;