一个文件,包含要提取的不同元素并包含在列表中。 当我尝试填充将包含在字典中的列表时,我需要检索特定信息并在列表中包含在确定的位置。 任何有关python的帮助都将受到赞赏。
assembly_report_dir示例:
Assembly name: Pav631_1.0
Organism name: Pseudomonas avellanae BPIC 631 (g-proteobacteria)
Infraspecific name: strain=BPIC 631
Taxid: 11547
BioSample: SAMN02471966
BioProject: PRJNA84293
Submitter: University of Toronto Centre for the Analysis of Genome Evolution and Function Date: 2012-10-10
Assembly type: n/a
Release type: major
Assembly level: Scaffold
Genome representation: full
WGS project: AKBS01
Assembly method: CLC
以下是我尝试的行:
report_dict = {}
for root, dirs, reports in os.walk(assembly_report_dir):
for report in reports:
accession = '_'.join(report.strip().split('/')[-1].replace('_assembly_report.txt', '').split('_')[0:2])
path = os.path.join(assembly_report_dir, report) # path = the name of the genbank with the complete path to it
with open(path, 'r') as inputfile:
lines = inputfile.readlines()
description = []
for line in lines:
if line.startswith('Organism name: '):
organism = line.strip().split(': ')[-1].split(' (', 1)[0]
species = ' '.join(organism.split(' ')[0:2])
description.append(species)
elif line.startswith('Infraspecific name: strain='):
strain = line.strip().replace(' ','').split('strain=')[-1]
description.append(strain)
elif line.startswith('Assembly name: '):
assembly = line.strip().split(': ')[-1]
description.(assembly)
report_dict[accession] = description
print report_dict
问题是合并到列表(程序集)的最后一个参数包含在列表的第一个位置而不是最后一个位置。
我的输出是:
description = ["assembly", "species, "strain"]
我想要这样的列表:
description = ["species", "strain", "assembly"]
答案 0 :(得分:0)
一种非常粗暴和肮脏的做法...因为列表的长度已修复,所以此代码可以正常运行...
report_dict = {}
for root, dirs, reports in os.walk(assembly_report_dir):
for report in reports:
accession = '_'.join(report.strip().split('/')[-1].replace('_assembly_report.txt', '').split('_')[0:2])
path = os.path.join(assembly_report_dir, report) # path = the name of the genbank with the complete path to it
with open(path, 'r') as inputfile:
lines = inputfile.readlines()
MAX_LENGTH = 3
description = ['null' for x in range(MAX_LENGTH)]
for line in lines:
if line.startswith('Organism name: '):
organism = line.strip().split(': ')[-1].split(' (', 1)[0]
species = ' '.join(organism.split(' ')[0:2])
description[0] = str(species)
elif line.startswith('Infraspecific name: strain='):
strain = line.strip().replace(' ','').split('strain=')[-1]
description[1] = str(strain)
elif line.startswith('Assembly name: '):
assembly = line.strip().split(': ')[-1]
description[2] = str(assembly)
report_dict[accession] = description
print report_dict