Question

如果我有两个变量，并且想查看它们有多少个相同的字符，我该怎么做才能找出多少个错误的字符？例如：

a = "word"
b = "wind"
a - b = 2

有没有办法做到这一点或使上面的工作奏效？

编辑：计算时还应考虑顺序

Edit2：所有这些都应如下所示

a = bird
b = word
<program to find answer> 2


a = book
b = look
<program to find answer> 3


a = boat
b = obee
<program to find answer> 0

a = fizz
b = faze
<program to find answer> 2

Answer 1

您可以执行以下操作：

sum(achar != bchar for achar, bchar in zip(a,b))

在字符串长度相同的地方可以使用。如果它们的长度可能不同，那么您也可以考虑以下原因：

sum(achar != bchar for achar, bchar in zip(a,b)) + abs(len(a) - len(b))

尽管这将只允许单词在开头匹配，所以wordy和word之间的差异为1，而wordy和{{1}之间的差异}将为5。如果您希望该差为1，则需要更复杂的逻辑。

Answer 2

这可能不适用于所有情况，但是如果您想比较字符，可以使用set：

a = "word"
b = "wind"

diff = set.intersection(set(a),set(b))
print(len(diff))
>> 2

当您将序列分组为一组唯一字符时，它将忽略序列。

您可以使用的另一个有趣的Python标准模块库是difflib。

from difflib import Differ

d = Differ()

a = "word"
b = "wind"

[i for i in d.compare(a,b) if i.startswith('-')]
>>['- o', '- r']

difflib本质上为您提供了比较序列（例如字符串）的方法。在上面的Differ对象中，您可以比较2个字符串，并识别要添加或删除的字符，以跟踪从字符串a到字符串b的变化。在给定的示例中，列表推导用于过滤从a到b中删除的字符，您还可以检查以+开头的字符是否添加了字符。 / p>

[i for i in d.compare(a,b) if i.startswith('+')]
>>['+ i', '+ n']

或两个序列寻址共有的字符

如何检查一个变量与另一个变量共有多少个字符变量

common = [i for i in d.compare(a,b) if i.startswith('  ')]
print(common, len(common))
>> ['  w', '  d'] 2

您可以详细了解Differ对象here

Answer 3

您所描述的需要的是单词之间的编辑距离度量。提到了汉明距离，但是它不能正确解释不同长度的单词，因为它只能解释替换。其他常见指标包括“最长公共子字符串”，“ Levenshtein距离”，“ Jaro距离”等。

您的问题似乎描述了Levenshtein距离，该距离由单个字符编辑到一个单词与另一个单词（插入，删除或替换）的最小数量定义。如果您想阅读和理解有关该主题的更多信息（或继续浏览维基百科的内容），那么wikipedia页将非常详尽，但是就编码而言，在pip上已经存在一个库：{{ 3}}，该算法在c中实现了算法，可加快执行速度。

示例：

这是pip install python-Levenshtein的递归实现，带有大量注释，可帮助您了解其工作原理。

from functools import lru_cache
@lru_cache(maxsize=4095) #recursive approach will calculate some substrings many times, 
                         # so we can cache the result and re-use it to speed things up.
def ld(s, t):
    if not s: return len(t) #if one of the substrings is empty, we've reached our maximum recursion
    if not t: return len(s) # the difference in length must be added to edit distance (insert that many chars.)

    if s[0] == t[0]: #equal chars do not increase edit distance
        return ld(s[1:], t[1:]) #remove chars that are the same and find distance
    else: #we must edit next char so we'll try insertion deletion and swapping
        l1 = ld(s, t[1:]) #insert char (delete from `t`)
        l2 = ld(s[1:], t) #delete char (insert to `t`)
        l3 = ld(s[1:], t[1:]) #swap chars
        #take minimum distance of the three cases we tried and add 1 for this edit
        return 1 + min(l1, l2, l3)

并对其进行测试：

>>>ld('kitten', 'sitting') #swap k->s, swap e->i, insert g
Out[3]: 3

Answer 4

计算公共字符并从更长的字符串的长度中减去。根据您的编辑和评论，我认为您正在寻找这个：

def find_uncommon_chars(word1, word2):
    # select shorter and longer word
    shorter = word1
    longer = word2
    if len(shorter) > len(longer):
        shorter = word2
        longer = word1

    # count common chars
    count = 0
    for i in range(len(shorter)):
        if shorter[i] == longer[i]:
            count += 1
    # if you return just count you have number of common chars
    return len(longer) - count

如何检查一个变量与另一个变量共有多少个字符

4 个答案: