我想要比较两列,如果字符串相同,最大差异或一个字符的LACK,我想为它做一个标志。例如:
select
,name1
,name2
,case when "name1 is like name2 except only 1 different character, or
lack of 1 character compared to the other" then 1
else 0
end same_flag
from example
示例输出:
name1 - name 2 - sameflag
john - jon - 1
sara - sarah - 1
filip - filis - 1
phillip - philis - 0
我希望它的工作反之亦然。因此name1可以与name2不同,但在另一行中name2可以与name1不同。
答案 0 :(得分:2)
您可以从utl_match
包中选择一个功能:
with data (name1, name2) as (
select'john','jon' from dual union all
select'sara','sarah' from dual union all
select'filip','filis' from dual union all
select'phillip','philis' from dual
)
select name1, name2,
utl_match.edit_distance(name1, name2) as ed,
utl_match.edit_distance_similarity(name1, name2) as ed_similarity,
utl_match.jaro_winkler(name1, name2) as jw,
utl_match.jaro_winkler_similarity(name1, name2) as jw_similarity
from data;
返回:
NAME1 | NAME2 | ED | ED_SIMILARITY | JW | JW_SIMILARITY
--------+--------+----+---------------+------+--------------
john | jon | 1 | 75 | 0.93 | 93
sara | sarah | 1 | 80 | 0.96 | 96
filip | filis | 1 | 80 | 0.92 | 92
phillip | philis | 2 | 72 | 0.91 | 90
根据您的需求以及您对结果的喜好,您可以执行以下操作:
case when utl_match.edit_distance(name1, name2) < 2 then 1 else e end
或使用百分比作为阈值:
case when utl_match.edit_distance_similarity(name1, name2) > 75 then 1 else e end
答案 1 :(得分:1)
这是非常线性的 - 只是在字母上循环并计算差异。
我已经更新了 - 现在理查德和Rchard被认为是相同的......
FUNCTION compare_strings
(P_string1 IN VARCHAR2
,P_string2 IN VARCHAR2)
RETURN NUMBER
IS
l_long_string VARCHAR2(100) ;
l_short_string VARCHAR2(100) ;
l_diff_count NUMBER := 0 ;
l_result NUMBER ;
j NUMBER := 1 ;
k NUMBER := 1 ;
BEGIN
IF LENGTH(P_string1) >= LENGTH(P_string2) THEN
l_long_string := P_string1 ;
l_short_string := P_string2 ;
ELSE
l_long_string := P_string2 ;
l_short_string := P_string1 ;
END IF ;
--if one string is more than one char longer than the other then we must
--have a difference
IF LENGTH(l_long_string) - LENGTH(l_short_string) > 1 THEN
l_result := 0 ;
END IF ;
FOR i IN 1..LENGTH(l_long_string) LOOP
IF NVL(SUBSTR(P_string1,j,1),'##') != NVL(SUBSTR(P_string2,k,1),'##') THEN
l_diff_count := l_diff_count + 1 ;
--shift along one letter in the long string but stay put in the short string
j := j + 1 ;
ELSE
--shift along on both strings
j := j + 1 ;
k := k + 1 ;
END IF ;
--EXIT WHEN l_diff_count > 1 ;
END LOOP ;
IF l_diff_count > 1 THEN
l_result := 1;
ELSE
l_result := 0 ;
END IF ;
RETURN(l_result) ;
--RETURN(l_diff_count) ;
END compare_strings ;
答案 2 :(得分:0)
尝试这个并适应两者的计数长度并进行比较。
How to find count and names of distinct characters in string in PL/SQL
实际上不是我的答案,但这为计算长度和数字差异提供了基础。