确定列表中的值位置附加值

时间:2018-01-12 15:06:52

标签: python

一个文件,包含要提取的不同元素并包含在列表中。 当我尝试填充将包含在字典中的列表时,我需要检索特定信息并在列表中包含在确定的位置。 任何有关python的帮助都将受到赞赏。

assembly_report_dir示例:

Assembly name:  Pav631_1.0 
Organism name: Pseudomonas avellanae BPIC 631 (g-proteobacteria) 
Infraspecific name:  strain=BPIC 631 
Taxid:          11547 
BioSample:      SAMN02471966 
BioProject:     PRJNA84293 
Submitter:      University of Toronto Centre for the Analysis of Genome Evolution and Function Date:         2012-10-10 
Assembly type:  n/a 
Release type:   major 
Assembly level: Scaffold 
Genome representation: full 
WGS project:    AKBS01 
Assembly method: CLC 

以下是我尝试的行:

report_dict = {}
for root, dirs, reports in os.walk(assembly_report_dir):
    for report in reports:
    accession = '_'.join(report.strip().split('/')[-1].replace('_assembly_report.txt', '').split('_')[0:2])

    path = os.path.join(assembly_report_dir, report) # path = the name of the genbank with the complete path to it

    with open(path, 'r') as inputfile:
        lines = inputfile.readlines()
        description = []
        for line in lines:

            if line.startswith('Organism name:  '):
                organism = line.strip().split(':  ')[-1].split(' (', 1)[0]
                species = ' '.join(organism.split(' ')[0:2])
                description.append(species)

            elif line.startswith('Infraspecific name:  strain='):
                strain = line.strip().replace(' ','').split('strain=')[-1]
                description.append(strain)

            elif line.startswith('Assembly name:  '):
                assembly = line.strip().split(':  ')[-1]
                description.(assembly)

          report_dict[accession] = description  

print report_dict

问题是合并到列表(程序集)的最后一个参数包含在列表的第一个位置而不是最后一个位置。

我的输出是:

description = ["assembly", "species, "strain"]

我想要这样的列表:

description = ["species", "strain", "assembly"]

1 个答案:

答案 0 :(得分:0)

一种非常粗暴和肮脏的做法...因为列表的长度已修复,所以此代码可以正常运行...

 report_dict = {}
for root, dirs, reports in os.walk(assembly_report_dir):
    for report in reports:
    accession = '_'.join(report.strip().split('/')[-1].replace('_assembly_report.txt', '').split('_')[0:2])

    path = os.path.join(assembly_report_dir, report) # path = the name of the genbank with the complete path to it

    with open(path, 'r') as inputfile:
        lines = inputfile.readlines()
        MAX_LENGTH = 3
        description = ['null' for x in range(MAX_LENGTH)]
        for line in lines:

            if line.startswith('Organism name:  '):
                organism = line.strip().split(':  ')[-1].split(' (', 1)[0]
                species = ' '.join(organism.split(' ')[0:2])
                description[0] = str(species)

            elif line.startswith('Infraspecific name:  strain='):
                strain = line.strip().replace(' ','').split('strain=')[-1]
                description[1] = str(strain)

            elif line.startswith('Assembly name:  '):
                assembly = line.strip().split(':  ')[-1]
                description[2] = str(assembly)

          report_dict[accession] = description  

print report_dict