查询分组依据和计数

时间:2015-11-09 16:22:51

标签: sql regex oracle oracle11g

我需要帮助Oracle查询按列1分组并返回列_2中匹配的数字字符数,匹配需要从右侧开始,即第2列的最后一个字符,因为有些字符在专栏的开头总是不同的。

COLUMN_1                            COLUMN_2
53bf8a7c860a11e5a7ab2b0669b590c8    5ce63254860a11e5a7ab2b0669b590c8
53bf8a7c860a11e5a7ab2b0669b590c8    f35c3a08860a11e5a7ab2b0669b590c8
53bf8a7c860a11e5a7ab2b0669b590c8    f49712bc860a11e5a7ab2b0669b590c8
53bf8a7c860a11e5a7ab2b0669b590c8    0df52992860b11e5a7ab2b0669b590c8
c05d6368860811e5983f09a623895e19    d1fd4548860811e5983f09a623895e19
c05d6368860811e5983f09a623895e19    87ea0648860911e5983f09a623895e19
c05d6368860811e5983f09a623895e19    0316e024860b11e5983f09a623895e19
c05d6368860811e5983f09a623895e19    0450d68e860b11e5983f09a623895e19

上面运行查询的输出应为

COLUMN_1                            Count_of_COLUMN_2
53bf8a7c860a11e5a7ab2b0669b590c8    24
c05d6368860811e5983f09a623895e19    24

如果第2列中有模式,我将使用此标识,即我总是具有相同数量的匹配字符。

2 个答案:

答案 0 :(得分:4)

如果您在任何匹配的字符之后 - 从字符串的右侧开始 - 即使它们不相邻(例如,' abc'和' badc'的匹配计数为2,因为位置1和3(从右侧)在两列中匹配),那么这应该可以解决问题:

with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce63254860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select     column_1,
           column_2,
           count(case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end) cnt_matching_chars
from       sample_data
connect by prior column_1 = column_1
           and prior column_2 = column_2
           and prior sys_guid() is not null
           and level <= length(column_2)
group by   column_1,
           column_2;

COLUMN_1                         COLUMN_2                         CNT_MATCHING_CHARS
-------------------------------- -------------------------------- ------------------
53bf8a7c860a11e5a7ab2b0669b590c8 f49712bc860a11e5a7ab2b0669b590c8                 25
c05d6368860811e5983f09a623895e19 0450d68e860b11e5983f09a623895e19                 24
53bf8a7c860a11e5a7ab2b0669b590c8 5ce63254860a11e5a7ab2b0669b590c8                 25
c05d6368860811e5983f09a623895e19 87ea0648860911e5983f09a623895e19                 24
c05d6368860811e5983f09a623895e19 d1fd4548860811e5983f09a623895e19                 26
53bf8a7c860a11e5a7ab2b0669b590c8 f35c3a08860a11e5a7ab2b0669b590c8                 26
c05d6368860811e5983f09a623895e19 0316e024860b11e5983f09a623895e19                 23
53bf8a7c860a11e5a7ab2b0669b590c8 0df52992860b11e5a7ab2b0669b590c8                 23

这基本上取字符串,将它们转换为column2中字符串中每个字符的行,然后比较相同位置的字符(从右边缘开始计数),然后计算它们。

但是,如果你正在寻找右手边的并发匹配字符集,并且不关心&#34; break&之后左边进一步匹配的任何后续字符。 #34;,然后以下应该做的伎俩:

with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce63254860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select   column_1,
         column_2,
         count(matching_chars) cnt_matching_chars
from     (select     column_1,
                     column_2,
                     case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end matching_chars,
                     row_number() over (partition by column_1, column_2
                                        order by level)
                       - row_number() over (partition by column_1, column_2, case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end
                                            order by level) grp
          from       sample_data
          connect by prior column_1 = column_1
                     and prior column_2 = column_2
                     and prior sys_guid() is not null
                     and level <= length(column_2))
where    grp = 0
group by column_1,
         column_2,
         grp
order by column_1,
         column_2;

COLUMN_1                         COLUMN_2                         CNT_MATCHING_CHARS
-------------------------------- -------------------------------- ------------------
53bf8a7c860a11e5a7ab2b0669b590c8 0df52992860b11e5a7ab2b0669b590c8                 20
53bf8a7c860a11e5a7ab2b0669b590c8 5ce63254860a11e5a7ab2b0669b590c8                 24
53bf8a7c860a11e5a7ab2b0669b590c8 f35c3a08860a11e5a7ab2b0669b590c8                 24
53bf8a7c860a11e5a7ab2b0669b590c8 f49712bc860a11e5a7ab2b0669b590c8                 25
c05d6368860811e5983f09a623895e19 0316e024860b11e5983f09a623895e19                 20
c05d6368860811e5983f09a623895e19 0450d68e860b11e5983f09a623895e19                 20
c05d6368860811e5983f09a623895e19 87ea0648860911e5983f09a623895e19                 20
c05d6368860811e5983f09a623895e19 d1fd4548860811e5983f09a623895e19                 25

这类似于计算匹配的字符,但在获取第一个组并执行计数之前,它还使用Tabibitosan来计算匹配字符组。

如果你在每个column_1的所有column_2行的最小计数之后,那么你需要在查询周围抛出另一个组,例如,对于第二个查询:

with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce63254860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select   column_1,
         min(cnt_matching_chars) min_cnt_matching_chars
from     (select   column_1,
                   column_2,
                   count(matching_chars) cnt_matching_chars
          from     (select     column_1,
                               column_2,
                               case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end matching_chars,
                               row_number() over (partition by column_1, column_2
                                                  order by level)
                                 - row_number() over (partition by column_1, column_2, case when substr(column_1, -level, 1) = substr(column_2, -level, 1) then 1 end
                                                      order by level) grp
                    from       sample_data
                    connect by prior column_1 = column_1
                               and prior column_2 = column_2
                               and prior sys_guid() is not null
                               and level <= length(column_2))
          where    grp = 0
          group by column_1,
                   column_2,
                   grp)
group by column_1
order by column_1;

COLUMN_1                         MIN_CNT_MATCHING_CHARS
-------------------------------- ----------------------
53bf8a7c860a11e5a7ab2b0669b590c8                     20
c05d6368860811e5983f09a623895e19                     20

答案 1 :(得分:0)

这是一个简单正则表达式的解决方案。无需将字符串拆分为单个字符。

with sample_data as (select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '5ce6354860a11e5a7ab2b0669b590c89' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f35c3a08860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, 'f49712bc860a11e5a7ab2b0669b590c8' column_2 from dual union all
                     select '53bf8a7c860a11e5a7ab2b0669b590c8' column_1, '0df52992860b11e5a7ab2b0669b590c8' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, 'd1fd4548860811e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '87ea0648860911e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0316e024860b11e5983f09a623895e19' column_2 from dual union all
                     select 'c05d6368860811e5983f09a623895e19' column_1, '0450d68e860b11e5983f09a623895e19' column_2 from dual)
select column_1,column_2,
length(regexp_substr(column_1 ||'-'|| column_2,'(\w+)-.*\1$',1,1,'i',1))
from sample_data;

column_1                            column_2                            matching_chars
--------------------------------------------------------------------------------------
53bf8a7c860a11e5a7ab2b0669b590c8    5ce6354860a11e5a7ab2b0669b590c89    null
53bf8a7c860a11e5a7ab2b0669b590c8    f35c3a08860a11e5a7ab2b0669b590c8    24
53bf8a7c860a11e5a7ab2b0669b590c8    f49712bc860a11e5a7ab2b0669b590c8    25
53bf8a7c860a11e5a7ab2b0669b590c8    0df52992860b11e5a7ab2b0669b590c8    20
c05d6368860811e5983f09a623895e19    d1fd4548860811e5983f09a623895e19    25
c05d6368860811e5983f09a623895e19    87ea0648860911e5983f09a623895e19    20
c05d6368860811e5983f09a623895e19    0316e024860b11e5983f09a623895e19    20
c05d6368860811e5983f09a623895e19    0450d68e860b11e5983f09a623895e19    20

想法是使用分隔符连接字符串并使用反向引用在分隔符之前和字符串末尾提取匹配的字符。 然后找到提取的字符串的长度。