python-查找主题-输入:具有10个序列和10个主题的.txt文件

时间:2019-02-07 18:50:53

标签: python-3.6 brute-force motif

当我仅使用一个输入来运行我的BruteForce函数时,它就可以工作并且结果正确。

def BruteForce(s, t):
    occurrences = []
    for i in range(len(s)-len(t)+1): # loop over alignment
        match = True
        for j in range(len(t)): # loop over characters
            if s[i+j] != t[j]:  # compare characters
                match = False   # mismatch
                break
        if match:   # allchars matched
            occurrences.append(i)

    print(occurrences)

t ='CAACTTCCA'

S = 'GACAACTTCCAACTTCCAACTTCCCGTCCCAACTTCACAACTTCGGCCCAACTTCCATGCAACTTCACCATCAACTTCGCTCGAAGCTGCCTTCCACTCCAACTTCACAACTTCCTCAACTTCCTCACCAACTTCAGCAACTTCTCTAGGGCCAACTTCCAACTTCTCAACTTCTCAACTTCCAACTTCCGACAACTTCTCCTGGCAACTTCCAACTTCCAACTTCAATACAACTTCGCAGACAACTTCCGCAACTTCGAACAACTTCCAACTTCCCCAACTTCCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCCCAACTTCAGATAGCAACTTCGATCTTACACAACTTCACGCAACTTCTCCAACTTCCAACTTCTGTGCAACTTCTCTGAACAACTTCCTCAACTTCCAACTTCGCAACTTCCCCAACTTCCTCAACTTCATGCAACTTCGAGGCAACTTCCCAACTTCGCAACTTCCTATTCCCAACTTCTGTGGCAACTTCTCAACTTCTGGACAACTTCTATGCCCAACTTCACAACTTCCCCAACTTCTTTACAACTTCGACAACTTCATCAACTTCTAGTCAACTTCTGGTCCAACTTCCAACTTCCCCAACTTCCAAAGTGCCGCAACTTCGTAACAACTTCACGCGCTCAACTTCAACCAACTTCTTTTCCCGCAACTTCGCAACTTCACAACTTCTAATCAACTTCCAACTTCGGATCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCTCCAGGGACAACTTCAAGTACAACTTCCAACTTCGCAACTTCACAACTTCCCAACTTCGCAACTTCTACACGCAACTTCCAACTTCTGGTCCCAACTTCATCAACTTCAGTCAACTTC'

BruteForce(s,t)

[2、9、48、152、175、205、212、261、277、284、300、307、370、410、607、623、717、735、751、758、792、844]

但是当我在以下循环中使用它时,它什么也不返回!我已经使用了print()来查看它是否从input_2.txt文件中读取了字符串和图案,并且确实读取了出现的空列表[]。

with open('input_2.txt', 'r') as input_2:
    nt = input_2.readline() #nt is the no of strings+motifs in 1st line
    nt = int(nt) #in this case it is 10

    for i in range(nt):
        s = input_2.readline()
        print(s)
        t = input_2.readline()
        print(t)
        BruteForce(s, t)

能否让我知道这里需要进行哪些更改?非常感谢,Beeta

input_2.txt

10

GACAACTTCCAACTTCCAACTTCCCGTCCCAACTTCACAACTTCGGCCCAACTTCCATGCAACTTCACCATCAACTTCGCTCGAAGCTGCCTTCCACTCCAACTTCACAACTTCCTCAACTTCCTCACCAACTTCAGCAACTTCTCTAGGGCCAACTTCCAACTTCTCAACTTCTCAACTTCCAACTTCCGACAACTTCTCCTGGCAACTTCCAACTTCCAACTTCAATACAACTTCGCAGACAACTTCCGCAACTTCGAACAACTTCCAACTTCCCCAACTTCCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCCCAACTTCAGATAGCAACTTCGATCTTACACAACTTCACGCAACTTCTCCAACTTCCAACTTCTGTGCAACTTCTCTGAACAACTTCCTCAACTTCCAACTTCGCAACTTCCCCAACTTCCTCAACTTCATGCAACTTCGAGGCAACTTCCCAACTTCGCAACTTCCTATTCCCAACTTCTGTGGCAACTTCTCAACTTCTGGACAACTTCTATGCCCAACTTCACAACTTCCCCAACTTCTTTACAACTTCGACAACTTCATCAACTTCTAGTCAACTTCTGGTCCAACTTCCAACTTCCCCAACTTCCAAAGTGCCGCAACTTCGTAACAACTTCACGCGCTCAACTTCAACCAACTTCTTTTCCCGCAACTTCGCAACTTCACAACTTCTAATCAACTTCCAACTTCGGATCAACTTCCAACTTCGCCAACTTCCAACTTCCAACTTCTCCAGGGACAACTTCAAGTACAACTTCCAACTTCGCAACTTCACAACTTCCCAACTTCGCAACTTCTACACGCAACTTCCAACTTCTGGTCCCAACTTCATCAACTTCAGTCAACTTC

CAACTTCCA

ATTGTGTATCGCTATGTATCGTGTATCGGATTTTGTATCGTGTATCGGTGTATCGTGTATCGTGTATCGTATCACTGTATCGTTGTATCGTAGCGTTGTATCGTAATGTATCGCTCTGTATCGTGTATCGGGTTTGTATCGATGTGTATCGCTGTATCGGTGTATCGGTGTATCGCTTGTATCGACAGCGCTTGTATCGTGTATCGACCTGTATCGGTGTATCGTGTATCGAATGTATCGTTGTATCGAATTGTGTATCGTGTATCGTGTATCGTGTATCGTATGTATCGACTGTATCGCTGTATCGTAGCCTGTATCGTGTATCGGTGTATCGGTGTATCGGCGTGTATCGAATGTATCGTTGTATCGCTGTATCGTGTGCTGTGTATCGGTGTATCGACCCGTTGTATCGTGTATCGTGGGTAAGTGTATCGTTGTATCGTAGTGTATCGTGTATCGTGTATCGTCATGGTATTGTATCGTTATAGCTGTATCGCTCGGCTGTATCGTATGTATCGTGTATCGTGTATCGCATGTATCGGTGTATCGCTGTATCGACCATGTATCGTGTATCGGTGTATCGATGTATCGCTTCCATTAGAAATGTATCGTATGTATCGTGTATCGCTGTATCGTTTGTATCGCATGTATCGATTGTGTATCGTGTATCGTTGTATCGTGTATCGTGTATCGTTGTATCGTAATAACGATGTATCGAATGATGTATCGTTGTGATGTATCGTGTATCGACTGTATCGATGTATCGGGTTGTATCGTGTATCGTGTATCGTAGATGTATCGAGAGCCATTGTATCGCCTTGTATCGCGTGTATCGGTGTGTATCGTTTGTATCGTGTTTGAATGTATCGCTGTATCG

TGTATCGTG

TTTCAGCGTGTTTGGCTTTCAGCACTACTTTCAGCGGTTTCAGCTTTTTCAGCTTTTCAGCTATTTTCAGCTTTCAGCCATCTTTCAGCCCAGGTTTCAGCTGTTTCAGCTTTCAGCGAGTTTCAGCTTTCAGCGCGGGGATTTCAGCGTATTTCAGCGACATTTCAGCAGTTTCAGCTGGAGTGAAGCCGTTTCAGCGCTTTTCAGCACCATTTCAGCTAGTAGGTTTCAGCTTTTCAGCCTTTTCAGCTTTCAGCTATTTCAGCTCTAGAAGTCGTTTCAGCTTTCAGCTTCCTTTTCAGCTTTCAGCTTTCAGCCTTTCAGCTTTCAGCAAACTTTCAGCGATTTCAGCTTTCAGCTTTCAGCTTTCAGCTTTCAGCATTTCAGCTTTCAGCCGTTTTCAGCGACTCTTTCAGCGTTTCAGCTTTCAGCTGATCGTTTTCAGCTTTTCAGCCGGTTTCAGCGAAAGTTGGTCTTTTCAGCCAAATTTTCAGCTTTCAGCTTTCAGCGTTTCAGCTTTTCAGCCATTTCAGCCTATTTTCAGCAATCTTTCAGCAATTTTCAGCGCAATTTCAGCCAAAATTTCAGCATTTCAGCTTTCAGCGTAGTTTCAGCTTTCAGCTGAGCCTTTCAGCTTTTCAGCTCTCTTTTCAGCTTTCAGCGATTTCAGCCTTTTCAGCGCTTTCAGCACCACGCCTTTCAGCTTTCAGCTTTTCAGCCTTTCAGCCTATTTTCAGCGGTTTCAGCGTTTCAGCTTTCAGCACTTTCAGCTTTCAGCTTTTCAGCGGAAGGTTTTCAGCA

TTTCAGCTT

AGCATCCGACTGCATCCGTGCATCCGACGGCATCCGGCATCCGCAGCATCCGCACAGCCGCATCCGAGCCCGCATCCGAAGCATCCGGCATCCGGGCATCCGGCCGCATCCGGCATCCGGGCATCCGTGCATCCGGCATCCGTGTGCATCCGGCATCCGGGGTGCTTGCATCCGCGCATCCGTGCATCCGCGCATCCGGCATCCGCAGTCCGCATCCGCGCATCCGGCATCCGAGCATCCGTATTCGCATCCGTCGAGCATCCGTACGTCGCATCCGTGCATCCGGGAGAGGCATCCGGGCATCCGCGCATCCGGTGGACTATAACGCGCATCCGGCGAGCATCCGCCGCGGCATCCGTGCATCCGGTACGGCATCCGGGCTGAGGCATCCGCTCCCGGGGCATCCGCGGCGCATCCGCACGCGCATCCGGCATCCGAGCATCCGTTGCATCCGGAGCATCCGTAGCATCCGATAGCATCCGCAAGCTTGCATCCGGCATCCGCTACAGCATCCGGTGGCATCCGAAGCATCCGCGCATCCGCGGATTGCATCCGCACACGGCATCCGTAGCATCCGTGCATCCGGAGTGCATCCGCCGGGCGCATCCGCTGCATCCGCAGCGCATCCGAGCATCCGATCTTGCATCCGGCATCCGGGAATGCATCCGGGACTGCATCCGGTCTTAAGGGTGCATCCGAATGCATCCGCTGTAGCATCCGGCATCCGAGTAAGCATCCGAGTTCTGCATCCGGAAGCAGCATCCGGCATCCGGACACCAGCATCCGCGCATCCGGGGCGAGCATCCGAGCATCCGGCATCCGGGGCAAGTGGCATCCGGCATCCGTCGTGCATCCGGGCATCCGAA

GCATCCGGC

GACGAGCTGACGAGCGTGACGAGCAGGCGACGAGCGACGAGCATGCGGTGACGAGCGGACGAGCGACGACGAGCGCGGACGAGCGACGAGCGGACGAGCTCCGACGAGCGACGAGCACGACGAGCGCCCGACGAGCGACGAGCATGATGACGAGCGACGAGCTTAGACGAGCATGCTGACGAGCTGACGAGCTGACGAGCTCGTACATCGACGAGCGGACGAGCAGCCCGATAAGCCTTCGACGAGCTGACGAGCCACGACGAGCGACGAGCGACGAGCGGACGAGCGGGACGAGCGTCGGACGAGCGGACGAGCAGGACGAGCACCTCAATCGACGAGCTGACGAGCGGACGAGCCGACGAGCTGAACTGAAGGACGAGCCTACGTTCTAACGTGCCGTCACTGACGAGCGACGAGCGGACGAGCGATGGACGAGCTGACGAGCGCAGGACGAGCGAGAAGGCTGACGAGCAGACGAGCTGACGAGCTCGACGAGCAGACGAGCGGACGAGCTTTGATAAGACGAGCACGTCGACGAGCCTCGTGACGAGCTGACGAGCGACGAGCTAGACGAGCGACGAGCTGACGAGCGACGAGCACTAAACGCGACGAGCTCGACGACGAGCATGGACGAGCGACGAGCTAGGACGAGCGGCGACGAGCAGACGAGCGACGAGCTGACGAGCTTATAGACGAGCCTGACGACGAGCCAAGACGAGCGACGAGCGGACGAGCGACGAGCACGACGAGCCGACGAGCCGCGGACGAGCCGTAGACGAGCCAATCATTAAGACGAGCCGACGAGCCATTTGGGGACGAGCCTCCTCGACGAGCCATAAGACGAGCTGACGAGCCATGACGAGCATGCCCGACGAGC

GACGAGCGA

GAGGACCCCGGACCCCTGGACCCCAGGACCCCGACGGTGGGGCGGACCCCGAGAAGGACCCCTGGACCCCCGCAGGACCCCTTTATCGGACCCCGGACCCCGGACCCCCGGACCCCGGCTGGACCCCGTCGTAAGGACCCCGAATCGGACCCCTAGGACCCCAGGGACCCCTCCCCGGTTGGACCCCCTGGACCCCCTTGAGAGGGACCCCGGACCCCACGCCGTGCTTAAGGACCCCACTATGGACCCCATACGGACCCCGGACCCCGATCAGAGCGACCAGGACCCCCGGCCTGGACCCCTCGGACCCCAGAAGGACCCCTCGGACCCCTGGACCCCGGACCCCACAGGGACCCCGGACCCCTAGCCGCGGTGGACCCCCGGACCCCCGCAGATGGGGACCCCTCATGCGGACCCCCTGACGGACCCCTGGACCCCCGGGACCCCCAGGGACCCCATTTCGAGGACCCCATGGGGACCCCAAGCTGGACCCCGGACCCCTGGGACCCCCGGACCCCGGACCCCGGACCCCATAGGACCCCCGTTTGTTGCCATGGACCCCTTGGGACCCCGAGTCGGACCCCGGACCCCCAGGACCCCACAGGACCCCGGGACCCCGGACCCCATCGATCGGACCCCGGACCCCGGACCCCTACGGACCCCGCTAAGGGACCCCGGACCCCGTGGACCCCCACGGACCCCTAGGACCCCGGGACCCCGGACCCCGGACCCCAGTTGGACCCCCCTTAGGACCCCAGGACCCCACATGAGGACCCCGGGACCCCGGGACCCCTGGACCCCTGGACCCC

GGACCCCGG

AAACCTTCGGACCTTCGTACCTTCGACCTTCGTGAAACCTTCGGACCTTCGTGACCTTCGGGTGACCTTCGGGTACTCACCTTCGGCACCTTCGTATACCTTCGAGACCTTCGCTGGACTGTAAACCTTCGATCGTTATTTTCTACCTTCGCACACCTTCGACCTTCGAGCCGAACCTTCGGACTGGCCTCACCTTCGATCACCTTCGACCTTCGAACCTTCGACCTTCGACCATACCTTCGACCTTCGGACCTTCGGACCTTCGCACCTTCGAACCTTCGCCACCTTCGACCTTCGAACCTTCGGTGGGCCACCTTCGTTACCTTCGAACCTTCGGGCACCTTCGACCTTCGGCTTACACCTTCGGACCTTCGATGACTGACCTTCGAAACCTTCGACCTTCGACCTTCGATTCCACACCTTCGACCTTCGACACCTTCGACCTTCGCGTTCACCTTCGACCTTCGACCTTCGGGCCAATACCTTCGACCTTCGGCCTAATACCAACCTTCGAACCTTCGAACCTTCGGGTGACCTTCGGACCTTCGCGCACACCTTCGCACTTACCTTCGACCTTCGGTGACCTTCGACCTTCGTGCACCTTCGCGTCTCACCTTCGCACCTTCGTTACCTTCGAACCTTCGACCTTCGACAACCTTCGTACCTTCGCTACCTTCGGGAACCTTCGATGTCCTACCTTCGACCTTCGCGACCTTCGACCTTCGTAACCTTCGAACCTTCGACCACCTTCGAACCTTCGAAGGAAGGACCTTCGAATCATTACCTTCGTTACCTTCGTAGTACCTTCGCCGACCTTCGGACCTTAACCTTCGTGA

ACCTTCGAC

AAGTGGTATTCCCAAAAAGATATTCCCGTTATTCCCGCTATAAATATTCCCAACTGTTTATTCCCACTTATTCCCGTATTCCCGTATTCCCCGGGCCTTTTATTCCCCATTCTATTCCCGTATTCCCATATTCCCACTTATTCCCCTATTCCCATATTCCCTATTCCCTATTCCCGTAAGTATTCCCCCCCTTATTCCCCCATTTGTCCATATTCCCTCAATATTCCCTATTCCCCTATTCCCGCTTCCCCAGTCTATCCCAAATTGTATTCCCAATATTCCCCTTTATTCCCTTTATTCCCGTATTCCCTATTCCCGTGGCTATTCCCCCGACGTTATTCCCTCTATATTCCCGTCTCAAAGTATTCCCTTATTCCCAGTATTCCCATTCGCCTGGTATTCCCGTATTCCCGGTATTCCCGTTTATTCCCATCTATTCCCATTATTCCCTGGTATTCCCGCTATTCCCGCAACGGTATTCCCTATTCCCTATTCCCGTATTCCCAATATTCCCTTCCTATTCCCGTATTCCCTAGCTATTCCCATATTCCCGATATTCCCTATTCCCGTATTCCCTATTCCCTTATTCCCTATTCCCTAGTTATTCCCAAACAGGCTATTCCCTATTCCCTATTCCCATATTCCCTTATTCCCTACGTATTCCCTCTTAAAAATATTATTCCCCTATTCCCTATTCCCGTATTCCCCGAAAGTATTCCCCCATATTCCCAATTATATTCCCGGTATTCCCCCGTATTCCCTATTCCCTTATTCCCATATTCCCATATTCCCAAAGTTTATTCCCGTATTCCCCACTATTCCCATAGATTATTCCCTATTCCCTATTCCCTTATTCCCCTATTCCCTATTCCCTTATTCCCGTATTCCCAGTTTATTCCC

TATTCCCTA

GAAAGGATAAAGGATGTCAAAGGATGCAGATATAAAGGATAAAGGATACAAAGGATTAAGTATGTCCGAAAAAGGATAAAGGATTAAAAGGATCGCTGAACCTTACACAAAGGATAGTGAACAAAGGATTCAATAAAAGGATAAAGGATCAAAGGATCAAAAGGATACAAAGGATGCCGATGAAAGGATAAAGGATCTAAAGGATAAAGGATGAGAAAGGATTGGAAAGGATTATGAGATCAAAGGATAAAGGATGGTGTAAAAGCTAAAGGATTCGGCAAAGGATAAAGGATTAAAGGATAAAAAGGATAAAGGATGAAAGGATCCGCAGGGACCAGCAAAAGGATAAAGGATCGAATGGGTAAGAAAGGATCAAAGGATGAAAAAGGATTACTAAAGGATAAAGGATCCTGAAAGGATTACAAAGGATCTTAAAAGGATTCGGAAAGGATCCATAGGAAAAGGATAAAGGATCGCGAAAGGATAAAGGATTAAAGGATAAAGGATAAAAGGATTCAAAGGATAAAGGATAGACAAGGGAGAAAGGATGCAAAGGATGAAAGGATAAAGGATAAAGGATAAAGGATTAGGTTAAAGGATCGAAAGGATCCAAAGAGAGCGAAAAGGATGAGGACGAAAGGATCAAAGGATCCAAAGGATCAAAGGATAAAGGATTGCGATGGAAAGGATGTCAAAGGATCACCAAAGGATAAAGGATAATAAAGGATAAAGGATAAAGGATCAAAGGATTTGAAAAGGATAAAGGATCACGAAAGGATAAAAGGATTGGCAAAGGATGAAAAGGATGAAAGGATAAGCCCTCCAAAGGAT

AAAGGATAA

ATTCAAATACTTCAAATAGTCAAATATCAAATATGCTTCAAATATCAAATATCGGTCAAATAAGTCAAATAAGTCAAATATCAAATATCAAATACTCAAATATCAAATAGTGTCAAATACGAATTGGGTCAAATATCAAATATGTGTTCAAATATTCTTCAAATACTGGACACTCAAATAGGAGTCAAATATCAAATATCAAATACGTGAAGTGCTTGTCAAATATTTCAAATACTTTCAAATATTCAAATACTCAAATATGTTCAAATATCAAATATCAAATATTTCAAATATCAAATATCAAATAGTCCTCAAATAGCAAACCAGTCAAATATCAAATATGTCAAATATGCTCACGGCAACCTCAAATATCAAATATCAAATAGATGTCAAATAAGTTTCAAATATCAAATATCAAATATCAAATATCAAATATTCAAATATCAAATAGCGGGCTCAAATAAGCTCAAATACGTCAAATAGGGGGTCAAATAGATCAAATAGTCAAATATTCAAATACATCAAATAGTCAAATACAAGAACCACCGAGATCTCAAATAATCAAATATCAAATATGGTCAAATAAATATTCAAATAGGTCAAATAAGATCAAATATCAAATAAGTCGTCCATCAAATAGTCAAATAGCTCAAATAACCTCAAATATCAAATATTCAAATACCGGTCATCAAATAGGACAAATCAAATAGTCAAATAAGATCCTCTCAAATAATTCAAATAGCTGTTCAAATACTCAAATATCAAATAGTCAAATATCAAATATTCAAATATCAAATACTTCAAATATGTTCAAATAATCAAATACTCAAATATCAAATAATCAAATAATACTTCAAATATACCAAACGCTCAAATATTAGTTGGATCAAATATCTTCAAATATCAAATAA

TCAAATATC

1 个答案:

答案 0 :(得分:0)

这个简单的代码将解决您的问题,因此程序读取文件并将每一行放入列表中,从而过滤了列表,因为我需要删除'\ n',然后运行循环并检查每一项的索引。如果index是pair,则为s,t等于下一项的值。

f = open('test.txt', 'r')
all = list(filter(lambda a: a != '\n', f.readlines()))

for idx, val in enumerate(all):
    if idx % 2 == 0:
        s=val.rstrip()
        t=all[idx + 1].rstrip()
        BruteForce(s, t)