我有两个表,例如两个小输入表: -
Table1:-
columnA
man got wounded by dog
joe met sally
Table2:-
ColumnB
life is good
dog man got hunt
dumb man wounded iron
我想在columnA中搜索columnB中的一行,其中包含Eg的最大匹配字: -
Intermediate Output of above table should be:-
ColumnA ColumnB words_matching number_of_words
"man got wounded by dog" "dumb man wounded iron" "man,wounded" 2
"man got wounded by dog" "dog man got hunt" "dog,man,got" 3
在最终结果输出中,我想显示: -
ColumnA ColumnB words_matching number_of_words
"man got wounded by dog" "dog man got hunt" "dog,man,got" 3
P.S: - 我只提供了一个案例的输出,表格很大。也无法在列数据之间添加空格,因此使用了引号。
我尝试过使用heirarchial查询执行上述操作,但需要花费很多时间: - 我如何打破字符串的例子: -
select column1,regexp_substr(column1,'[^ ]+', 1, level) break_1 from table1
connect by regexp_substr(column1,'[^ ]+', 1, level) is not null;
以下是我提出的另一个问题,但由于笛卡尔加入,因为性能非常低,所以不认为它对于大数据是一个好主意:
select st1,st2,
max(round((extractvalue(dbms_xmlgen.getxmltype('select cardinality (
sys.dbms_debug_vc2coll(''' || replace(replace(lower(st1),''''), ' ', ''',''' ) || ''') multiset intersect
sys.dbms_debug_vc2coll('''||replace(replace(lower(st2),''''), ' ', ''',''' )||''')) x from dual'), '//text()')),2)) seq
from (
select l1.column1 st1,l2.column2 st2
from
table1 l1,table2 l2 ) group by st1,st2;
有人可以建议一个好的方法 -
答案 0 :(得分:1)
我找到了更快解决上述问题的方法,使用过程来破解字符串并存储在不同的表中,然后使用这些表来查找匹配的字符串。
步骤: -
create or replace
procedure split_string_word_match
as
type varr is table of varchar(4000);
list1 varr;
list2 varr;
begin
select distinct column1 bulk collect into list1 from table1 ;
select distinct column2 bulk collect into list2 from table2 ;
for k in list1.first..list1.last
loop
insert into list1_result
select list1(k),regexp_substr(list1(k),'[^ ]+', 1, level) break_1 from dual
connect by regexp_substr(list1(k),'[^ ]+', 1, level) is not null;
commit;
end loop;
for i in list2.first..list2.last
loop
insert into list2_result
select list2(i),regexp_substr(list2(i),'[^ ]+', 1, level) break_2 from dual
connect by regexp_substr(list2(i),'[^ ]+', 1, level) is not null;
commit;
end loop;
end;
/
然后在结果表中使用下面的sql来查找最匹配的字符串:-(比在程序中的许多循环工作得快,因此写了一个SQL)
select st1,st2,cs_string ,max(cnt) max_count
from (
select l1.column1 st1,l2.column2 st2,listagg(l1.break_1,',') within group(order by l1.break_1) cs_string ,count(1) cnt
from list1_result l1,list2_result l2
where l1.break_1 = l2.break_1
group by l1.column1,l2.column2)
group by st1,st2,cs_string;