Question

我想要比较两列，如果字符串相同，最大差异或一个字符的LACK，我想为它做一个标志。例如：

select
    ,name1
    ,name2
    ,case when "name1 is like name2 except only 1 different character, or 
               lack of 1 character compared to the other" then 1 
          else 0 
     end same_flag
from example

示例输出：

name1    -     name 2   -  sameflag   

john     -     jon      -        1    
sara     -     sarah    -        1    
filip    -     filis    -        1    
phillip  -     philis   -        0

我希望它的工作反之亦然。因此name1可以与name2不同，但在另一行中name2可以与name1不同。

Answer 1

您可以从utl_match包中选择一个功能：

with data (name1, name2) as (
  select'john','jon' from dual union all    
  select'sara','sarah' from dual union all    
  select'filip','filis' from dual union all    
  select'phillip','philis' from dual 
)
select name1, name2, 
       utl_match.edit_distance(name1, name2) as ed,
       utl_match.edit_distance_similarity(name1, name2) as ed_similarity,
       utl_match.jaro_winkler(name1, name2) as jw,
       utl_match.jaro_winkler_similarity(name1, name2) as jw_similarity
from data;

返回：

NAME1   | NAME2  | ED | ED_SIMILARITY | JW   | JW_SIMILARITY
--------+--------+----+---------------+------+--------------
john    | jon    |  1 |            75 | 0.93 |            93
sara    | sarah  |  1 |            80 | 0.96 |            96
filip   | filis  |  1 |            80 | 0.92 |            92
phillip | philis |  2 |            72 | 0.91 |            90

根据您的需求以及您对结果的喜好，您可以执行以下操作：

case when utl_match.edit_distance(name1, name2) < 2 then 1 else e end

或使用百分比作为阈值：

case when utl_match.edit_distance_similarity(name1, name2) > 75 then 1 else e end

Answer 2

这是非常线性的 - 只是在字母上循环并计算差异。

我已经更新了 - 现在理查德和Rchard被认为是相同的......

  FUNCTION compare_strings
   (P_string1        IN VARCHAR2
   ,P_string2        IN VARCHAR2)
  RETURN NUMBER
  IS

    l_long_string    VARCHAR2(100) ;
    l_short_string   VARCHAR2(100) ;
    l_diff_count     NUMBER := 0 ;

    l_result         NUMBER ;

    j                NUMBER := 1 ;
    k                NUMBER := 1 ;

  BEGIN

    IF LENGTH(P_string1) >= LENGTH(P_string2) THEN
      l_long_string := P_string1 ;
      l_short_string := P_string2 ;
    ELSE
      l_long_string := P_string2 ;
      l_short_string := P_string1 ;
    END IF ;


    --if one string is more than one char longer than the other then we must
    --have a difference
    IF LENGTH(l_long_string) - LENGTH(l_short_string) > 1 THEN
      l_result := 0 ;
    END IF ;


    FOR i IN 1..LENGTH(l_long_string) LOOP


     IF NVL(SUBSTR(P_string1,j,1),'##') != NVL(SUBSTR(P_string2,k,1),'##') THEN
       l_diff_count := l_diff_count + 1 ;
       --shift along one letter in the long string but stay put in the short string
       j := j + 1 ;
     ELSE
       --shift along on both strings
       j := j + 1 ;
       k := k + 1 ;
     END IF ;
     --EXIT WHEN l_diff_count > 1 ;


    END LOOP ;

    IF l_diff_count > 1 THEN
      l_result := 1;
    ELSE
      l_result := 0 ;
    END IF ;

    RETURN(l_result) ;
    --RETURN(l_diff_count) ;

  END compare_strings ;

Answer 3

尝试这个并适应两者的计数长度并进行比较。

How to find count and names of distinct characters in string in PL/SQL

实际上不是我的答案，但这为计算长度和数字差异提供了基础。

如果CASE表达式中只有一个字符差异，则使两个字符串相等

3 个答案: