Question

以下是我编写的代码，用于计算两个输入字符串共有的长度为2的子字符串数。此外，子字符串应位于两个字符串中的相同位置。

def string_match(a, b):
  count=0
  shorter=min(len(a),len(b))
  for i in range(shorter):
    if(a[i:i+2]==b[i:i+2]):
      count=count+1
    else:
      continue
  return count

对于长度不同的字符串，代码运行正常，但对于长度相同的字符串给出了错误的答案。例如：＆＃39; abc＆＃39;和＆＃39; abc＆＃39;应该返回2，但它返回3并且也是＆＃39; abc＆＃39;和＆＃39; axc＆＃39;应该返回0但它返回1。上述问题可以通过改变范围（较短）到范围（较短-1）来解决，但我不明白为什么？另外，如果可能的话，建议我改变上面的代码，无论两个字符串中的位置如何，都可以计算相同的子字符串。

提前谢谢！

Answer 1

一些好的旧打印调试应该让事情更清晰：

#!/usr/bin/env python2
#coding=utf8

def string_match(a, b):
    count=0
    shorter=min(len(a),len(b))
    print 'comparing', a, b
    for i in range(shorter):
        x = a[i:i+2]
        y = b[i:i+2]
        print 'checking substrings at %d: ' % i, x, y
        if x == y:
            count=count+1
        else:
            continue
    return count


for a, b in (('abc', 'abc'), ('abc', 'axc')):
    count = string_match(a,b)
    print a, b, count

输出：

so$ ./test.py 
comparing abc abc
checking substrings at 0:  ab ab
checking substrings at 1:  bc bc
checking substrings at 2:  c c
abc abc 3
comparing abc axc
checking substrings at 0:  ab ax
checking substrings at 1:  bc xc
checking substrings at 2:  c c
abc axc 1

看到问题？你总是在最后比较长度为1的子串。这是因为'abc'[2:4]只会为您提供'c'。

因此，当您比较长度为n-1的子字符串时，您需要提前结束一步（或者更常见的是n步骤。这正是您的-1更改所做的事情，这就是它有用的原因。

-1更改：

#!/usr/bin/env python2
#coding=utf8

def string_match(a, b):
    count=0
    shorter=min(len(a),len(b))
    print 'comparing', a, b
    for i in range(shorter-1):
        x = a[i:i+2]
        y = b[i:i+2]
        print 'checking substrings at %d: ' % i, x, y
        if x == y:
            count=count+1
        else:
            continue
    return count


for a, b in (('abc', 'abc'), ('abc', 'axc')):
    count = string_match(a,b)
    print a, b, count

新输出：

so$ ./test.py 
comparing abc abc
checking substrings at 0:  ab ab
checking substrings at 1:  bc bc
abc abc 2
comparing abc axc
checking substrings at 0:  ab ax
checking substrings at 1:  bc xc
abc axc 0

Answer 2

检查您的for循环

for i in range(shorter):
    if a[i:i+2]==b[i:i+2]:
        count=count+1
    else:
        continue

默认情况下，

range(n)从0变为n-1。那么在n-1的情况下会发生什么？您的循环正在尝试访问n-1到n+1个字符。但较小的字符串只有n个字符。所以Python简单地返回那个字母而不是两个字母，因此两个长度相等但前一个字符相同的字符串会产生误报。这就是range(shorter - 1)是必要的原因。

同样使用continue是多余的，因为默认情况下循环将继续

要在字符串中找到长度为2 的子字符串，这应该就足够了

def string_match(string1, string2): string1subs = [string1[i:i+2] for i in range(len(string1) - 1)] count = 0 for i in range(len(string2) - 1): if string2[i:i+2] in string1subs: count += 1 return count

创建一个列表string1subs，其中包含string1中所有长度为2的子字符串。然后循环遍历string2中长度为2的所有子字符串，并检查它是否是string1的子字符串。如果您更喜欢更简洁的版本：

def string_match(string1, string2): string1subs = [string1[i:i+2] for i in range(len(string1) - 1)] return sum(string2[i:i+2] in string1subs for i in range(len(string2) - 1))

使用sum完全相同的版本以及在Python中True等于1的事实。

Answer 3

最好的方法是根本不使用任何索引访问：

def string_match(a, b):
    count = 0
    equal = False
    for c, d in zip(a,b):
        count += equal and c == d
        equal = c == d
    return count

或使用生成器表达式：

from itertools import islice
def string_match(a, b):
    return sum(a1 == b1 and a2 == b2
        for a1, a2, b1, b2 in zip(a, islice(a,1,None), b, islice(b,1,None)))

为什么代码不适用于长度相同的字符串？

3 个答案: