如何使我的模式匹配功能停止报告不匹配?

时间:2019-03-29 12:35:17

标签: python python-3.x

我正在尝试编写模式匹配功能。 我想检查字符串是否以特定子字符串的任何“切片”结尾,然后返回包含布尔值和匹配字符串(如果有)的字典。

如果str1是:

ABCD

str2是:

HDNABCD

str2是:

HDNSHAB

该函数应返回:

results = {'match': True, 'string': '<matching string>'}

即使是以单个A字符结尾的字符串也应返回True

不匹配应返回:

results = {'match': False, 'string': ''}

这是我到目前为止所拥有的。

def matchpatend(str1, str2):
    '''Find any substring at the end of a string'''
    index = len(str1)
    while index > 0:
        index = index - 1
        if str2.endswith(str1):
            result = {'match': True,
                      'string': str(str1)}
            return result
        elif str2.endswith(str1[:index]):
            result = {'match': True,
                      'string': str(str1[:index])}
            return result

这里是在程序主体中使用的时间。

adpater_seq_1 = 'GACTGCAT'
with open(fastq, 'r') as in_f_obj, open(new_file_1, 'w') as out_f_obj: 
    line_count = 0
    id_seq = ''
    base_seq = ''
    for line in in_f_obj:  # Read the fastq file line by line
        line_count += 1
        if line_count % 4 == 1:  # Find the read ID line.
            id_seq = line.rstrip()  # Store the read ID line.
        elif line_count % 4 == 2:  # Find the sequence line.
            base_seq = line.rstrip()  # Store the sequence line.

            results = matchpatend(adapter_seq_1, base_seq)
            if results['match'] is True:
                out_f_obj.write("{}\n{}\nAdapter contamination: {}\n".format(id_seq, base_seq, results['string']))
            elif results['match'] is False:
                break

该代码可以正确输出匹配项及其匹配的字符串,但是它也可以将具有空白字符串的不匹配项输出到输出文件。

如何阻止程序写入False匹配项?有没有更好的方法编写此函数?

1 个答案:

答案 0 :(得分:0)

这是一个解决方案,它首先按照长度的降序生成模式的所有前导子字符串(因为我想您要匹配一个尽可能长的子字符串?),然后遍历它们,检查每个转。

def leading_substrings(string):
    return (string[:i] for i in range(len(string), 0, -1))

def match_pattern_end(pattern, sequence):
    for sub in leading_substrings(pattern):
        if sequence.endswith(sub):
            return {"match": True, "string": sub}
    return {"match": False, "string": ""}

print(match_pattern_end("ABCD", "HDNABCD"))  # {'match': True, 'string': 'ABCD'}
print(match_pattern_end("ABCD", "HDNHABC"))  # {'match': True, 'string': 'ABC'}
print(match_pattern_end("ABCD", "HDNSHCD"))  # {'match': False, 'string': ''}
print(match_pattern_end("ABCD", "HDNSHAB"))  # {'match': True, 'string': 'AB'}
print(match_pattern_end("ABCD", "HDNSAGH"))  # {'match': False, 'string': ''}