获取PHP或MySQL中重叠字符的数量?

时间:2011-11-30 13:07:22

标签: php mysql string-comparison

我想计算与两个字符串比较时重叠字符的数量。假设您进行了这些比较:

       boel <-> baal
       boel <-> bol
beestenboel <-> boelsten
beestenboel <-> baastenb
      hallo <-> hello

结果必须如下:

BoeL         } b matches, o does not match,
BaaL         } e does not match, l matches.
               Result: overlap = 2

BOeL         } b matches, o matches, l matches
BO L         } e does not match (it's not present in the lower string).
               Result: overlap = 3

B EeSTENboel } b matches, e matches (because o is only present in the lower
BoElSTEN     } string), the second e is no longer present (since we have
               already consumed an e from the lower string, l does not match,
               s, t, e, n match successively.
               (Notice that b, e, o and l from the upper string will be ignored,
               since all characters from the lower string have already been
               consumed.)
               Result: overlap = 6

BeeSTENBoel  } b matches, the two e's do not match with the two a's, and again,
BaaSTENB     } s, t, e, n match.
               Result: overlap = 6

HaLLO        } h matches, a doesn't match
HeLLO        } l, l and o match.
               Result: overlap = 4

我怀疑我的想法太复杂了......如何在MySQL或PHP中实现上述结果?

(我想levenshtein算法与这个问题有关。)

4 个答案:

答案 0 :(得分:0)

这个描述让我想起了我在学习期间学到的所有DNA比对算法。我不完全确定,您需要他们正在做的所有事情,但请查看Needleman-WunschSmith-Waterman

答案 1 :(得分:0)

正如你自己提到的,levenshtein算法很可能是你所需要的,所以我建议你尝试一下。它是否会完全返回您不确定的结果,但您应该查看页面上的所有注释。 comment section

中有很多黄金需要收获

如果您拥有对服务器的超级用户权限,则还可以install this on your mySQL server,感谢Matthieu Aubry

答案 2 :(得分:0)

答案 3 :(得分:0)

是的,您可以使用Levenshtein Distance,但由于您的专栏是英文并且假设它是一个字长,您可以使用Soundex它可以应用于获取匹配的字符串,请参阅{{ 3}}:

SELECT
  Word,
  SOUNDEX(Word) AS SoundTest,
  DIFFERENCE(Word, 'textentered') As DiffTest
FROM
  YourTable