Python:检查一个字符串的子字符串是否是另一个字符串的子字符串,条件是第一个子字符串是特定长度

时间:2014-10-22 00:05:41

标签: python regex string

我有两个字符串:

23971500000239713000002344550000023971900000

23971500000239719000002344550000023971600000

我想测试第一个字符串中是否存在至少一个长度为10或更长的子字符串也在第二个子字符串中。所以逻辑应该评估为真,因为

23971500000在两个字符串中。

5 个答案:

答案 0 :(得分:4)

s1 = "23971500000239713000002344550000023971900000"

s2 = "23971500000239719000002344550000023971600000"

test  = (s1[x:x+10] for x in xrange(len(s1)-9))
print(any(x in s2 for x in test))

答案 1 :(得分:2)

>>> s1 = '23971500000239713000002344550000023971900000'
>>> s2 = '23971500000239719000002344550000023971600000'
>>> minlen = 10

>>> subs = (s1[ii:ii+minlen] for ii in range(len(s1) - minlen + 1))
>>> any(sub in s2 for sub in subs)
True

即,在其中一个字符串中生成所有可能的最小长度子字符串,并检查它们是否在另一个字符串中。

当然,如果您拥有令人难以置信的长字符串,可以使用更高效的解决方案(请参阅Boyer-Moore获取灵感),但上述内容似乎可以满足您的需求并且非常简单。

答案 2 :(得分:2)

str1 = "23971500000239713000002344550000023971900000"
str2 = "23971500000239719000002344550000023971600000"

def subsearch(str1, str2):
    for i in range(len(str1)-9):
        if str1[i:10+i:] in str2:
           return True
    return False

print subsearch(str1, str2)
>>True

答案 3 :(得分:1)

s1 = '23971500000239713000002344550000023971900000'
s2 = '23971500000239719000002344550000023971600000'

def find_all_substrings(s):
    return [ s[i:i+10] for i in range(len(s)) if len(s[i:i+10]) == 10 ]

common_substrings = [s for s in find_all_substrings(s1) if s in s2]

一旦你确定有一个共同的字符串,你的问题就会模糊。但是len(common_strings) > 0时的情况属实。

答案 4 :(得分:0)

为了完整起见,下面是正则表达式的解决方案,只是为了显示可能性。

我不建议在生产代码中使用此解决方案。

import re

s1 = "23971500000239713000002344550000023971900000"
s2 = "23971500000239719000002344550000023971600000"

# Since both strings contain only digits, ~ can be safely used as separator
s = s1 + '~' + s2

查找匹配项:

re.match(r'^\d*?(\d{10,})\d*~\d*\1', s)

查找某个索引的所有最长匹配:

r = re.compile(r'(\d{10,})\d*~\d*\1')
o = [r.match(s, i) for i in range(0, len(s1))]

# Print result
print([i.group(1) if i else None for i in o])

示例的输出:

  

['2397150000023971', '397150000023971', '97150000023971', '7150000023971', '150000023971', '50000023971', '0000023971', None, None, None, None, None, None, None, None, None, None, '000002344550000023971', '00002344550000023971', '0002344550000023971', '002344550000023971', '02344550000023971', '2344550000023971', '344550000023971', '44550000023971', '4550000023971', '550000023971', '50000023971900000', '0000023971900000', '000023971900000', '00023971900000', '0023971900000', '023971900000', '23971900000', '3971900000', None, None, None, None, None, None, None, None, None]

您需要编写更多代码来删除前一个匹配后缀的匹配项。