我想计算与两个字符串比较时重叠字符的数量。假设您进行了这些比较:
boel <-> baal
boel <-> bol
beestenboel <-> boelsten
beestenboel <-> baastenb
hallo <-> hello
结果必须如下:
BoeL } b matches, o does not match,
BaaL } e does not match, l matches.
Result: overlap = 2
BOeL } b matches, o matches, l matches
BO L } e does not match (it's not present in the lower string).
Result: overlap = 3
B EeSTENboel } b matches, e matches (because o is only present in the lower
BoElSTEN } string), the second e is no longer present (since we have
already consumed an e from the lower string, l does not match,
s, t, e, n match successively.
(Notice that b, e, o and l from the upper string will be ignored,
since all characters from the lower string have already been
consumed.)
Result: overlap = 6
BeeSTENBoel } b matches, the two e's do not match with the two a's, and again,
BaaSTENB } s, t, e, n match.
Result: overlap = 6
HaLLO } h matches, a doesn't match
HeLLO } l, l and o match.
Result: overlap = 4
我怀疑我的想法太复杂了......如何在MySQL或PHP中实现上述结果?
(我想levenshtein算法与这个问题有关。)
答案 0 :(得分:0)
这个描述让我想起了我在学习期间学到的所有DNA比对算法。我不完全确定,您需要他们正在做的所有事情,但请查看Needleman-Wunsch和Smith-Waterman。
答案 1 :(得分:0)
正如你自己提到的,levenshtein算法很可能是你所需要的,所以我建议你尝试一下。它是否会完全返回您不确定的结果,但您应该查看页面上的所有注释。 comment section
中有很多黄金需要收获如果您拥有对服务器的超级用户权限,则还可以install this on your mySQL server,感谢Matthieu Aubry
答案 2 :(得分:0)
答案 3 :(得分:0)
是的,您可以使用Levenshtein Distance,但由于您的专栏是英文并且假设它是一个字长,您可以使用Soundex它可以应用于获取匹配的字符串,请参阅{{ 3}}:
SELECT
Word,
SOUNDEX(Word) AS SoundTest,
DIFFERENCE(Word, 'textentered') As DiffTest
FROM
YourTable