在PL / SQL中进行单词匹配搜索的最佳方法是什么?
E.g。字符串“ BROUGHTONS OF CHELTENHAM LIMITED ”
“BROUGHTONS LIMITED”是匹配
“OF LIMITED”是匹配
“CHELTENHAM BROUGHTONS”是匹配
“BROUG”是不匹配
答案 0 :(得分:4)
这是一种相当粗略的方法,但应该按照你的要求行事。正如Xophmeister所指出的那样,你可能需要对每个字符串进行标记,然后搜索标记(因为你想要不按顺序匹配,做一个简单的“像%tokenA%tokenB%tokenC%”将无效)。
此外,这甚至没有涉及语音,soundex等所有问题。但是,再次,不是你问的问题。这也不会触及性能或扩展问题,并且可能只对少量数据可接受。
所以,首先我们需要一个分割函数:
create or replace
function fn_split(i_string in varchar2, i_delimiter in varchar2 default ',', b_dedup_tokens in number default 0)
return sys.dbms_debug_vc2coll
as
l_tab sys.dbms_debug_vc2coll;
begin
select regexp_substr(i_string,'[^' || i_delimiter || ']+', 1, level)
bulk collect into l_tab
from dual
connect by regexp_substr(i_string, '[^' || i_delimiter || ']+', 1, level) is not null
order by level;
if (b_dedup_tokens > 0) then
return l_tab multiset union distinct l_tab;
end if;
return l_tab;
end;
现在我们可以用它来检查特定标记的字符串。在这里,我正在从一组数据样本中搜索3个令牌(John Q Public)
with test_data as (
select 1 as id, 'John Q Public' as full_name from dual
union
select 2 as id, 'John John Smith' as full_name from dual
union
select 3 as id,'Sally Smith' from dual
union
select 4 as id, 'Mr John B B Q Public' from dual
union
select 5 as id, 'A Public John' from dual
)
select d.id, d.full_name, count(1) as hits
from test_data d, table(fn_split(full_name, ' ', 1))
-- should have at least 1 of these tokens
where column_value in ('John', 'Q', 'Public')
group by d.id, d.full_name
-- can also restrict results to those with at least x token hits
having count(1) >= 2
-- most hits at top of results
order by count(1) desc, id asc
输出:
"ID" "FULL_NAME" "HITS"
1 "John Q Public" 3
4 "Mr John B B Q Public" 3
5 "A Public John" 2
您还可以添加“upper”以区分大小写,等等。
答案 1 :(得分:0)
使用Oracle Text索引。这将允许您发出强大的CONTAINS查询。
http://docs.oracle.com/cd/B28359_01/text.111/b28303/quicktour.htm