我需要帮助Oracle查询按列1分组并返回列_2中匹配的数字字符数,匹配需要从右侧开始,即第2列的最后一个字符,因为有些字符在专栏的开头总是不同的。
COLUMN_1 COLUMN_2
53bf8a7c860a11e5a7ab2b0669b590c8 5ce63254860a11e5a7ab2b0669b590c8
53bf8a7c860a11e5a7ab2b0669b590c8 f35c3a08860a11e5a7ab2b0669b590c8
53bf8a7c860a11e5a7ab2b0669b590c8 f49712bc860a11e5a7ab2b0669b590c8
53bf8a7c860a11e5a7ab2b0669b590c8 0df52992860b11e5a7ab2b0669b590c8
c05d6368860811e5983f09a623895e19 d1fd4548860811e5983f09a623895e19
c05d6368860811e5983f09a623895e19 87ea0648860911e5983f09a623895e19
c05d6368860811e5983f09a623895e19 0316e024860b11e5983f09a623895e19
c05d6368860811e5983f09a623895e19 0450d68e860b11e5983f09a623895e19
上面运行查询的输出应为
COLUMN_1 Count_of_COLUMN_2
53bf8a7c860a11e5a7ab2b0669b590c8 24
c05d6368860811e5983f09a623895e19 24
如果第2列中有模式,我将使用此标识,即我总是具有相同数量的匹配字符。
答案 0 :(得分:4)
如果您在任何匹配的字符之后 - 从字符串的右侧开始 - 即使它们不相邻(例如,' abc'和' badc'的匹配计数为2,因为位置1和3(从右侧)在两列中匹配),那么这应该可以解决问题:
with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce63254860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select column_1,
column_2,
count(case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end) cnt_matching_chars
from sample_data
connect by prior column_1 = column_1
and prior column_2 = column_2
and prior sys_guid() is not null
and level <= length(column_2)
group by column_1,
column_2;
COLUMN_1 COLUMN_2 CNT_MATCHING_CHARS
-------------------------------- -------------------------------- ------------------
53bf8a7c860a11e5a7ab2b0669b590c8 f49712bc860a11e5a7ab2b0669b590c8 25
c05d6368860811e5983f09a623895e19 0450d68e860b11e5983f09a623895e19 24
53bf8a7c860a11e5a7ab2b0669b590c8 5ce63254860a11e5a7ab2b0669b590c8 25
c05d6368860811e5983f09a623895e19 87ea0648860911e5983f09a623895e19 24
c05d6368860811e5983f09a623895e19 d1fd4548860811e5983f09a623895e19 26
53bf8a7c860a11e5a7ab2b0669b590c8 f35c3a08860a11e5a7ab2b0669b590c8 26
c05d6368860811e5983f09a623895e19 0316e024860b11e5983f09a623895e19 23
53bf8a7c860a11e5a7ab2b0669b590c8 0df52992860b11e5a7ab2b0669b590c8 23
这基本上取字符串,将它们转换为column2中字符串中每个字符的行,然后比较相同位置的字符(从右边缘开始计数),然后计算它们。
但是,如果你正在寻找右手边的并发匹配字符集,并且不关心&#34; break&之后左边进一步匹配的任何后续字符。 #34;,然后以下应该做的伎俩:
with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce63254860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select column_1,
column_2,
count(matching_chars) cnt_matching_chars
from (select column_1,
column_2,
case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end matching_chars,
row_number() over (partition by column_1, column_2
order by level)
- row_number() over (partition by column_1, column_2, case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end
order by level) grp
from sample_data
connect by prior column_1 = column_1
and prior column_2 = column_2
and prior sys_guid() is not null
and level <= length(column_2))
where grp = 0
group by column_1,
column_2,
grp
order by column_1,
column_2;
COLUMN_1 COLUMN_2 CNT_MATCHING_CHARS
-------------------------------- -------------------------------- ------------------
53bf8a7c860a11e5a7ab2b0669b590c8 0df52992860b11e5a7ab2b0669b590c8 20
53bf8a7c860a11e5a7ab2b0669b590c8 5ce63254860a11e5a7ab2b0669b590c8 24
53bf8a7c860a11e5a7ab2b0669b590c8 f35c3a08860a11e5a7ab2b0669b590c8 24
53bf8a7c860a11e5a7ab2b0669b590c8 f49712bc860a11e5a7ab2b0669b590c8 25
c05d6368860811e5983f09a623895e19 0316e024860b11e5983f09a623895e19 20
c05d6368860811e5983f09a623895e19 0450d68e860b11e5983f09a623895e19 20
c05d6368860811e5983f09a623895e19 87ea0648860911e5983f09a623895e19 20
c05d6368860811e5983f09a623895e19 d1fd4548860811e5983f09a623895e19 25
这类似于计算匹配的字符,但在获取第一个组并执行计数之前,它还使用Tabibitosan来计算匹配字符组。
如果你在每个column_1的所有column_2行的最小计数之后,那么你需要在查询周围抛出另一个组,例如,对于第二个查询:
with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce63254860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select column_1,
min(cnt_matching_chars) min_cnt_matching_chars
from (select column_1,
column_2,
count(matching_chars) cnt_matching_chars
from (select column_1,
column_2,
case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end matching_chars,
row_number() over (partition by column_1, column_2
order by level)
- row_number() over (partition by column_1, column_2, case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end
order by level) grp
from sample_data
connect by prior column_1 = column_1
and prior column_2 = column_2
and prior sys_guid() is not null
and level <= length(column_2))
where grp = 0
group by column_1,
column_2,
grp)
group by column_1
order by column_1;
COLUMN_1 MIN_CNT_MATCHING_CHARS
-------------------------------- ----------------------
53bf8a7c860a11e5a7ab2b0669b590c8 20
c05d6368860811e5983f09a623895e19 20
答案 1 :(得分:0)
这是一个简单正则表达式的解决方案。无需将字符串拆分为单个字符。
with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce6354860a11e5a7ab2b0669b590c89' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select column_1,column_2,
length(regexp_substr(column_1 ||'-'|| column_2,'(\w+)-.*\1$',1,1,'i',1))
from sample_data;
column_1 column_2 matching_chars
--------------------------------------------------------------------------------------
53bf8a7c860a11e5a7ab2b0669b590c8 5ce6354860a11e5a7ab2b0669b590c89 null
53bf8a7c860a11e5a7ab2b0669b590c8 f35c3a08860a11e5a7ab2b0669b590c8 24
53bf8a7c860a11e5a7ab2b0669b590c8 f49712bc860a11e5a7ab2b0669b590c8 25
53bf8a7c860a11e5a7ab2b0669b590c8 0df52992860b11e5a7ab2b0669b590c8 20
c05d6368860811e5983f09a623895e19 d1fd4548860811e5983f09a623895e19 25
c05d6368860811e5983f09a623895e19 87ea0648860911e5983f09a623895e19 20
c05d6368860811e5983f09a623895e19 0316e024860b11e5983f09a623895e19 20
c05d6368860811e5983f09a623895e19 0450d68e860b11e5983f09a623895e19 20
想法是使用分隔符连接字符串并使用反向引用在分隔符之前和字符串末尾提取匹配的字符。 然后找到提取的字符串的长度。