我正在尝试编写模式匹配功能。 我想检查字符串是否以特定子字符串的任何“切片”结尾,然后返回包含布尔值和匹配字符串(如果有)的字典。
如果str1是:
ABCD
str2是:
HDNABCD
str2是:
HDNSHAB
该函数应返回:
results = {'match': True, 'string': '<matching string>'}
(即使是以单个A字符结尾的字符串也应返回True )
不匹配应返回:
results = {'match': False, 'string': ''}
这是我到目前为止所拥有的。
def matchpatend(str1, str2):
'''Find any substring at the end of a string'''
index = len(str1)
while index > 0:
index = index - 1
if str2.endswith(str1):
result = {'match': True,
'string': str(str1)}
return result
elif str2.endswith(str1[:index]):
result = {'match': True,
'string': str(str1[:index])}
return result
这里是在程序主体中使用的时间。
adpater_seq_1 = 'GACTGCAT'
with open(fastq, 'r') as in_f_obj, open(new_file_1, 'w') as out_f_obj:
line_count = 0
id_seq = ''
base_seq = ''
for line in in_f_obj: # Read the fastq file line by line
line_count += 1
if line_count % 4 == 1: # Find the read ID line.
id_seq = line.rstrip() # Store the read ID line.
elif line_count % 4 == 2: # Find the sequence line.
base_seq = line.rstrip() # Store the sequence line.
results = matchpatend(adapter_seq_1, base_seq)
if results['match'] is True:
out_f_obj.write("{}\n{}\nAdapter contamination: {}\n".format(id_seq, base_seq, results['string']))
elif results['match'] is False:
break
该代码可以正确输出匹配项及其匹配的字符串,但是它也可以将具有空白字符串的不匹配项输出到输出文件。
如何阻止程序写入False匹配项?有没有更好的方法编写此函数?
答案 0 :(得分:0)
这是一个解决方案,它首先按照长度的降序生成模式的所有前导子字符串(因为我想您要匹配一个尽可能长的子字符串?),然后遍历它们,检查每个转。
def leading_substrings(string):
return (string[:i] for i in range(len(string), 0, -1))
def match_pattern_end(pattern, sequence):
for sub in leading_substrings(pattern):
if sequence.endswith(sub):
return {"match": True, "string": sub}
return {"match": False, "string": ""}
print(match_pattern_end("ABCD", "HDNABCD")) # {'match': True, 'string': 'ABCD'}
print(match_pattern_end("ABCD", "HDNHABC")) # {'match': True, 'string': 'ABC'}
print(match_pattern_end("ABCD", "HDNSHCD")) # {'match': False, 'string': ''}
print(match_pattern_end("ABCD", "HDNSHAB")) # {'match': True, 'string': 'AB'}
print(match_pattern_end("ABCD", "HDNSAGH")) # {'match': False, 'string': ''}