为什么我的双循环不起作用?

时间:2013-01-26 19:36:00

标签: python csv for-loop bioinformatics

我试过这个双循环不起作用。 (见下文。)

基本上,我有一个构造列表和一个引物列表。引物通过“构建体编号”和“部分编号”与构建体结合。 (每个构建体由多个部分组成。)对于每个部分,都有一个“前向”和“反向”引物。对于那些倾向于分子生物学的SO成员,我基本上是在编写一个脚本来帮助我进行PCR。

我想要做的是:我想在引物列表中搜索那些应该与构建体部分相关的引物,并将它们连接成一个主列表。例如,如果我在其中有一个包含EMP792(fw)和EMP793(re)的列表(它们在不同的行上),并且它们与我的构造列表中的构造#1的第2部分相关联,我希望能够在“primer_list”中搜索相应的fw和re引物。如果构造的部分在列表中没有关联的引物,我想首先跳过这些构造。

我使用的策略是:我做了一个嵌套的for循环。对于构建列表中的每个构建体,我希望它在引物列表中搜索fw和re引物。我知道这是低效的,但作为初学程序员,这是我能想到的唯一方法。我通过检查与引物相关的构建体编号和部件编号,包括一些条件来检查这些构建体是否存在引物。

我面临的问题是:对于列表中的每个构造,循环不会搜索整个primer_list。它似乎会自动跳过之前比较过的所有引物,只比较尚未比较的下一个引物。这导致了处理中的问题,如果你运行带有相关数据集的代码(我也粘贴在代码下面),你会发现应该打印出相关引物的构造没有关联引物,它让我头疼,试图弄清楚出了什么问题(哈哈,哈哈......)!

我很感激任何帮助!

CODE:

with open('constructs-to-make-shortened2.csv', 'rU') as constructs:
    construct_list = csv.DictReader(constructs)

    with open('primers-with-notes-names.csv', 'rU') as primers:
        primers_list = csv.DictReader(primers)

        #make list of constructs for checking later on#
##        construct_numbers_list = []
##        for row in primers_list:
##            construct_numbers_list.append(row['construct number'])
##
##        print(construct_numbers_list)


        for construct in construct_list:
##            print('Currently at construct number ' + construct['Construct'])
##            print('Construct counter at ' + str(construct_counter))
##            print('Part number counter is at ' + str(part_number))
            master_row = {}
            master_row['construct'] = construct['Construct']
            master_row['strategy'] = construct['Strategy']
            master_row['construct name'] = construct['Construct Name']
            master_row['sequence'] = construct['Sequence']
            master_row['source'] = construct['Source']
            master_row['content'] = construct['Content']


            print('We are at construct number ' + str(construct['Construct']))
            print('Construct counter is at ' + str(construct_counter))
            is_next_construct = (int(construct['Construct']) > construct_counter)
            print('Are we at the next construct?')
            print(is_next_construct)

            if is_next_construct:
                part_number = 1
                construct_counter = int(construct['Construct'])
            print('Part number is now ' + str(part_number))

            for primer in primers_list:
                print(primer)


##                    print('Is primer ' + str(primer['name']) + ' associated with the construct?')
                is_associated_with_construct = bool(primer['construct number'] == construct['Construct'] and str(primer['part number']) == str(part_number))
##                    print(is_associated_with_construct)
                if(is_associated_with_construct == False):
                    break

                is_forward = bool(primer['construct number'] == construct['Construct'] and str(primer['part number']) == str(part_number) and primer['direction'] == 'fw primer')

                print('Primer ' + str(primer['name']) + ' is a forward primer?')
                print(is_forward)

                is_reverse = bool(primer['construct number'] == construct['Construct'] and str(primer['part number']) == str(part_number) and primer['direction'] == 're primer')

                print('Primer ' + str(primer['name']) + ' is a reverse primer?')
                print(is_reverse)

                if is_forward:
                    master_row['primer1'] = primer['name']
                    master_row['primer1 sequence'] = primer['primer sequence']
                    master_row['primer1 description'] = primer['notes']
                    master_row['primer1 length'] = primer['length']
##                        print(master_row)
                    continue

                elif is_reverse:
                    master_row['primer2'] = primer['name']
                    master_row['primer2 sequence'] = primer['primer sequence']
                    master_row['primer2 description'] = primer['notes']
                    master_row['primer2 length'] = primer['length']
##                        print(master_row)
                    part_number += 1
                    print('Part number now = ' + str(part_number) + '\n')
                    master_list.append(master_row)
                    break

DATA SUBSET(构造)(消除确切的序列以保持在SO字符限制内):

{'Sequence': '', 'Construct': '12', 'Strategy': 'Gibson', 'Content': 'Amp resistance marker', 'Source': 'pEM096', 'Construct Name': 'T7 RNAP core on BAC ori only with AmpR'}
{'Sequence': '', 'Construct': '12', 'Strategy': 'Gibson', 'Content': 'BAC origin and T7 RNAP core', 'Source': 'THSS301', 'Construct Name': 'T7 RNAP core on BAC ori only with AmpR'}
{'Sequence': '', 'Construct': '13', 'Strategy': 'Cut Gibson', 'Content': 'lycopene pathway (crtE.B.I.dxs.idi)', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid'}
{'Sequence': '', 'Construct': '13', 'Strategy': 'Cut Gibson', 'Content': 'vioABE pathway and pSC101 ori and CmR;  digest with EcoRI and XbaI', 'Source': 'KT-587', 'Construct Name': 'Combined vio and lyc plasmid'}
{'Sequence': '', 'Construct': '14', 'Strategy': 'Cut Gibson', 'Content': 'lycopene pathway (crtE.B.I.dxs.idi)', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid, with lyc in reverse direction'}
{'Sequence': '', 'Construct': '14', 'Strategy': 'Cut Gibson', 'Content': 'vioABE pathway and pSC101 ori and CmR;  digest with EcoRI and XbaI', 'Source': 'KT-587', 'Construct Name': 'Combined vio and lyc plasmid, with lyc in reverse direction'}
{'Sequence': '', 'Construct': '15', 'Strategy': 'Gibson', 'Content': 'vioABE pathway with random nucleotide spacers', 'Source': 'KT-587', 'Construct Name': 'Combined vio and lyc plasmid made by high GC polymerase'}
{'Sequence': '', 'Construct': '15', 'Strategy': 'Gibson', 'Content': 'lycopene pathway (crtE.B.I.dxs.idi)', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid made by high GC polymerase'}
{'Sequence': '', 'Construct': '15', 'Strategy': 'Gibson', 'Content': 'pSC101 origin of replication and CmR resistance marker', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid made by high GC polymerase'}
{'Sequence': '', 'Construct': '16', 'Strategy': 'Gibson', 'Content': 'P(tac)-SynZip18-T7 fragment', 'Source': 'THSS303', 'Construct Name': 'P(tac)-T7 fragment controller'}
{'Sequence': '', 'Construct': '16', 'Strategy': 'Gibson', 'Content': 'IncW backbone and TpR resistance and lacIq', 'Source': 'pEM103', 'Construct Name': 'P(tac)-T7 fragment controller'}
{'Sequence': '', 'Construct': '17', 'Strategy': 'Gibson', 'Content': 'P(tac)-SynZip18-T3 fragment', 'Source': 'THSS304', 'Construct Name': 'P(tac)-T3 fragment controller'}
{'Sequence': '', 'Construct': '17', 'Strategy': 'Gibson', 'Content': 'IncW backbone and TpR resistance and lacIq', 'Source': 'pEM103', 'Construct Name': 'P(tac)-T3 fragment controller'}

DATA SUBSET(引物):

{'part number': '1', 'direction': 'fw primer', 'name': 'EMP790', 'primer sequence': 'gtttgtcggtgaactaattCttattaccaatgcttaatcagggaggcacctatctcagcg', 'notes': 'Fw Gibson primer on pEM096 to extract Amp resistance marker', 'length': '60', 'construct number': '12'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP787', 'primer sequence': 'gatgaggatcgtttcgcatgctaaatacattcaaatatctatccgctcatgagacaataa', 'notes': 'Re Gibson primer on pEM096 to extract Amp resistance marker', 'length': '60', 'construct number': '12'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP788', 'primer sequence': 'agatatttgaatgtatttagcatgcgaaacgatcctcatcctgtctcttgatcagatctt', 'notes': 'Fw Gibson primer on THSS301 to extract BAC and R6K origins and T7 RNAP core', 'length': '60', 'construct number': '12'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP791', 'primer sequence': 'tgattaagcattggtaataaGaattagttcaccgacaaacaacagataaaacgaaaggcc', 'notes': 'Re Gibson primer on THSS301 to extract BAC origin and T7 RNAP core', 'length': '60', 'construct number': '12'}
{'part number': '1', 'direction': 'fw primer', 'name': 'EMP792', 'primer sequence': 'aaggaatattcagcaatttgGTTGGGGATAGCGCTAGCTATAATAactaTCACTATAGGG', 'notes': 'Fw Gibson primer on KT-587 to extract vioABE pathway with random nucleotide spacers', 'length': '60', 'construct number': '15'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP793', 'primer sequence': 'gggcctttcttcggcacgggGTTGTAGCAGGCGTCTTTGTCAAAAAACCCCTCAAGACCC', 'notes': 'Re Gibson primer on KT-587 to extract vioABE pathway with random nucleotide spacers', 'length': '60', 'construct number': '15'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP794', 'primer sequence': 'ACAAAGACGCCTGCTACAACcccgtgccgaagaaaggcccacccgtgaaggtgagccagt', 'notes': 'Fw Gibson primer on KT-537 to extract lycopene pathway (crtE.B.I.dxs.idi)', 'length': '60', 'construct number': '15'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP795', 'primer sequence': 'gaggtcattactggatctaTcccgtgccgaagaaaggcccacccgtgaaggtgagccagt', 'notes': 'Re Gibson primer on KT-537 to extract lycopene pathway (crtE.B.I.dxs.idi)', 'length': '60', 'construct number': '15'}
{'part number': '3', 'direction': 'fw primer', 'name': 'EMP796', 'primer sequence': 'gggcctttcttcggcacgggAtagatccagtaatgacctcagaactccatctggatttgt', 'notes': 'Fw Gibson primer on KT-537 to extract pSC101 origin of replication and CmR resistance marker', 'length': '60', 'construct number': '15'}
{'part number': '3', 'direction': 're primer', 'name': 'EMP797', 'primer sequence': 'TAGCTAGCGCTATCCCCAACcaaattgctgaatattccttttcttagacgtcaggtggca', 'notes': 'Re Gibson primer on KT-537 to extract pSC101 origin of replication and CmR resistance marker', 'length': '60', 'construct number': '15'}
{'part number': '1', 'direction': 'fw primer', 'name': 'EMP798', 'primer sequence': 'aaatattctgaaatgagctgttgacaattaatcatcggctcgtataatgtgtggaattgt', 'notes': 'Fw Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '16'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP799', 'primer sequence': 'attaccgcctttgagtgagccccaatgataaccccaagggaagttttagtcaaaagcctc', 'notes': 'Re Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '16'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP800', 'primer sequence': 'cccttggggttatcattggggctcactcaaaggcggtaatcagataaaaaaaatccttag', 'notes': 'Fw Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '16'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP801', 'primer sequence': 'agccgatgattaattgtcaacagctcatttcagaatatttgccagaaccgttatgatgtc', 'notes': 'Re Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '16'}
{'part number': '1', 'direction': 'fw primer', 'name': 'EMP798', 'primer sequence': 'aaatattctgaaatgagctgttgacaattaatcatcggctcgtataatgtgtggaattgt', 'notes': 'Fw Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '17'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP799', 'primer sequence': 'attaccgcctttgagtgagccccaatgataaccccaagggaagttttagtcaaaagcctc', 'notes': 'Re Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '17'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP800', 'primer sequence': 'cccttggggttatcattggggctcactcaaaggcggtaatcagataaaaaaaatccttag', 'notes': 'Fw Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '17'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP801', 'primer sequence': 'agccgatgattaattgtcaacagctcatttcagaatatttgccagaaccgttatgatgtc', 'notes': 'Re Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '17'}

1 个答案:

答案 0 :(得分:4)

问题是你正在迭代一个csv.DictReader对象,它不是一个列表,而是一个迭代器。

两者之间的区别在于,使用迭代器,你不能“回到开头”。内循环的每一步,你在primer_list上的迭代都从最后一次离开的地方开始。

如果您希望能够多次遍历所有项目并且内存充足,请将它们存储在列表中:

primers_list = list(csv.DictReader(primers))

如果要保持较低的内存使用率,可以在循环内每次从头创建DictReader对象。但是,这会在执行时添加一些(可能是次要的)开销,您应该通过将with语句移动到循环中来关闭文件。

另一种方法是在循环体的末尾做primers.seek(0),以便它在下一次迭代时从文件的开头开始读取,但我不确定它是否是一个好的黑客。