Question

一个文件，包含要提取的不同元素并包含在列表中。当我尝试填充将包含在字典中的列表时，我需要检索特定信息并在列表中包含在确定的位置。任何有关python的帮助都将受到赞赏。

assembly_report_dir示例：

Assembly name:  Pav631_1.0 
Organism name: Pseudomonas avellanae BPIC 631 (g-proteobacteria) 
Infraspecific name:  strain=BPIC 631 
Taxid:          11547 
BioSample:      SAMN02471966 
BioProject:     PRJNA84293 
Submitter:      University of Toronto Centre for the Analysis of Genome Evolution and Function Date:         2012-10-10 
Assembly type:  n/a 
Release type:   major 
Assembly level: Scaffold 
Genome representation: full 
WGS project:    AKBS01 
Assembly method: CLC

以下是我尝试的行：

report_dict = {}
for root, dirs, reports in os.walk(assembly_report_dir):
    for report in reports:
    accession = '_'.join(report.strip().split('/')[-1].replace('_assembly_report.txt', '').split('_')[0:2])

    path = os.path.join(assembly_report_dir, report) # path = the name of the genbank with the complete path to it

    with open(path, 'r') as inputfile:
        lines = inputfile.readlines()
        description = []
        for line in lines:

            if line.startswith('Organism name:  '):
                organism = line.strip().split(':  ')[-1].split(' (', 1)[0]
                species = ' '.join(organism.split(' ')[0:2])
                description.append(species)

            elif line.startswith('Infraspecific name:  strain='):
                strain = line.strip().replace(' ','').split('strain=')[-1]
                description.append(strain)

            elif line.startswith('Assembly name:  '):
                assembly = line.strip().split(':  ')[-1]
                description.(assembly)

          report_dict[accession] = description  

print report_dict

问题是合并到列表（程序集）的最后一个参数包含在列表的第一个位置而不是最后一个位置。

我的输出是：

description = ["assembly", "species, "strain"]

我想要这样的列表：

description = ["species", "strain", "assembly"]

Answer 1

一种非常粗暴和肮脏的做法...因为列表的长度已修复，所以此代码可以正常运行...

 report_dict = {}
for root, dirs, reports in os.walk(assembly_report_dir):
    for report in reports:
    accession = '_'.join(report.strip().split('/')[-1].replace('_assembly_report.txt', '').split('_')[0:2])

    path = os.path.join(assembly_report_dir, report) # path = the name of the genbank with the complete path to it

    with open(path, 'r') as inputfile:
        lines = inputfile.readlines()
        MAX_LENGTH = 3
        description = ['null' for x in range(MAX_LENGTH)]
        for line in lines:

            if line.startswith('Organism name:  '):
                organism = line.strip().split(':  ')[-1].split(' (', 1)[0]
                species = ' '.join(organism.split(' ')[0:2])
                description[0] = str(species)

            elif line.startswith('Infraspecific name:  strain='):
                strain = line.strip().replace(' ','').split('strain=')[-1]
                description[1] = str(strain)

            elif line.startswith('Assembly name:  '):
                assembly = line.strip().split(':  ')[-1]
                description[2] = str(assembly)

          report_dict[accession] = description  

print report_dict

确定列表中的值位置附加值

1 个答案: