如果CASE表达式中只有一个字符差异,则使两个字符串相等

时间:2018-06-06 11:42:03

标签: sql oracle string-comparison case-when

我想要比较两列,如果字符串相同,最大差异或一个字符的LACK,我想为它做一个标志。例如:

select
    ,name1
    ,name2
    ,case when "name1 is like name2 except only 1 different character, or 
               lack of 1 character compared to the other" then 1 
          else 0 
     end same_flag
from example

示例输出:

name1    -     name 2   -  sameflag   

john     -     jon      -        1    
sara     -     sarah    -        1    
filip    -     filis    -        1    
phillip  -     philis   -        0

我希望它的工作反之亦然。因此name1可以与name2不同,但在另一行中name2可以与name1不同。

3 个答案:

答案 0 :(得分:2)

您可以从utl_match包中选择一个功能:

with data (name1, name2) as (
  select'john','jon' from dual union all    
  select'sara','sarah' from dual union all    
  select'filip','filis' from dual union all    
  select'phillip','philis' from dual 
)
select name1, name2, 
       utl_match.edit_distance(name1, name2) as ed,
       utl_match.edit_distance_similarity(name1, name2) as ed_similarity,
       utl_match.jaro_winkler(name1, name2) as jw,
       utl_match.jaro_winkler_similarity(name1, name2) as jw_similarity
from data;

返回:

NAME1   | NAME2  | ED | ED_SIMILARITY | JW   | JW_SIMILARITY
--------+--------+----+---------------+------+--------------
john    | jon    |  1 |            75 | 0.93 |            93
sara    | sarah  |  1 |            80 | 0.96 |            96
filip   | filis  |  1 |            80 | 0.92 |            92
phillip | philis |  2 |            72 | 0.91 |            90

根据您的需求以及您对结果的喜好,您可以执行以下操作:

case when utl_match.edit_distance(name1, name2) < 2 then 1 else e end

或使用百分比作为阈值:

case when utl_match.edit_distance_similarity(name1, name2) > 75 then 1 else e end

答案 1 :(得分:1)

这是非常线性的 - 只是在字母上循环并计算差异。

我已经更新了 - 现在理查德和Rchard被认为是相同的......

  FUNCTION compare_strings
   (P_string1        IN VARCHAR2
   ,P_string2        IN VARCHAR2)
  RETURN NUMBER
  IS

    l_long_string    VARCHAR2(100) ;
    l_short_string   VARCHAR2(100) ;
    l_diff_count     NUMBER := 0 ;

    l_result         NUMBER ;

    j                NUMBER := 1 ;
    k                NUMBER := 1 ;

  BEGIN

    IF LENGTH(P_string1) >= LENGTH(P_string2) THEN
      l_long_string := P_string1 ;
      l_short_string := P_string2 ;
    ELSE
      l_long_string := P_string2 ;
      l_short_string := P_string1 ;
    END IF ;


    --if one string is more than one char longer than the other then we must
    --have a difference
    IF LENGTH(l_long_string) - LENGTH(l_short_string) > 1 THEN
      l_result := 0 ;
    END IF ;


    FOR i IN 1..LENGTH(l_long_string) LOOP


     IF NVL(SUBSTR(P_string1,j,1),'##') != NVL(SUBSTR(P_string2,k,1),'##') THEN
       l_diff_count := l_diff_count + 1 ;
       --shift along one letter in the long string but stay put in the short string
       j := j + 1 ;
     ELSE
       --shift along on both strings
       j := j + 1 ;
       k := k + 1 ;
     END IF ;
     --EXIT WHEN l_diff_count > 1 ;


    END LOOP ;

    IF l_diff_count > 1 THEN
      l_result := 1;
    ELSE
      l_result := 0 ;
    END IF ;

    RETURN(l_result) ;
    --RETURN(l_diff_count) ;

  END compare_strings ; 

答案 2 :(得分:0)

尝试这个并适应两者的计数长度并进行比较。

How to find count and names of distinct characters in string in PL/SQL

实际上不是我的答案,但这为计算长度和数字差异提供了基础。