我有两个序列AAAAAAAAAGAAAAGAAGAAG,AAAGAAG。 正确的答案是AAGAAG。
但是我的代码给了AA。
有时两个字符串将按此顺序排列AAAGAAG,AAAAAAAAAGAAAAGAAGAAG。
这是我的代码
`def longestSubstringFinder(string1, string2):
string1=string1.strip()
string2=string2.strip()
answer = ""
len1=len(string1)
len2=len(string2)
if int(len1)>1 and int(len2)>1:
for i in range(1,len1,1):
match = ""
for j in range(len2):
if len1>len2:
if i+j<len1 and (string1[i+j]==string2[i+j]):
match=str(match)+str(string2[i+j])
print(match)
else:
if len(match)>len(answer):
answer=match
match=""
elif len2>len1:
if i+j<len2 and (string1[i+j]==string2[i+j]):
match=str(match)+str(string2[i+j])
print(match)
else:
if len(match)>len(answer):
answer=match
match=""
return(answer)`
答案 0 :(得分:4)
获取两个字符串的所有子字符串,找到两组子字符串的交集,然后找到交集中的最大字符串
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in range(length) for j in range(i,length)]
strA = 'AAAAAAAAAGAAAAGAAGAAG'
strB = 'AAAGAAG'
intersection = set(get_all_substrings(strA)).intersection(set(get_all_substrings(strB)))
print(max(intersection, key=len))
>> 'AAAGAAG'
答案 1 :(得分:1)
几个星期前,我偶然发现了Python中的difflib
软件包,它非常适合这种工作。
以下是您问题的解决方案:
import difflib
matcher = difflib.SequenceMatcher()
str1 = 'AGAGGAG'
str2 = 'AAAAAAAAAGAAAAGAAGAAG'
matcher.set_seq2(str2)
matcher.set_seq1(str1)
m = matcher.find_longest_match(0, len(str1), 0, len(str2))
print("Longest sequence of {} found in {}: {}".format(str1, str2, str1[m.a: m.a+m.size]))
# Longest sequence of AAAGAAG found in AAAAAAAAAGAAAAGAAGAAG: AAAGAAG
print(str2[:m.b]+'|'+str2[m.b:m.b+m.size]+'|'+str2[m.b+m.size:])
# AAAAAAAAAGA|AAAGAAG|AAG
str1 = 'AGAG'
matcher.set_seq1(str1)
m = matcher.find_longest_match(0, len(str1), 0, len(str2))
print("Longest sequence of {} found in {}: {}".format(str1, str2, str1[m.a: m.a+m.size]))
# Longest sequence of AGAG found in AAAAAAAAAGAAAAGAAGAAG: AGA
print(str2[:m.b]+'|'+str2[m.b:m.b+m.size]+'|'+str2[m.b+m.size:])
# AAAAAAAA|AGA|AAAGAAGAAG
str1 = 'XXX'
matcher.set_seq1(str1)
m = matcher.find_longest_match(0, len(str1), 0, len(str2))
print("Longest sequence of {} found in {}: {}".format(str1, str2, str1[m.a: m.a+m.size]))
# Longest sequence of XXX found in AAAAAAAAAGAAAAGAAGAAG:
print(str2[:m.b]+'|'+str2[m.b:m.b+m.size]+'|'+str2[m.b+m.size:])
# ||AAAAAAAAAGAAAAGAAGAAG
SequenceMatcher计算并缓存有关的详细信息 第二个序列,所以如果你想比较一个序列与许多序列 序列,使用set_seq2()设置常用序列一次和 重复调用set_seq1(),每个其他序列调用一次。
它也非常快!
我定时@AK47太棒了solution而且时间10000 loops, best of 3: 85.2 µs per loop
我的解决方案时间10000 loops, best of 3: 31.6 µs per loop