我有一个有几千行的文件。我想逐行填充字典。该基因可以作为关键。如果找到基因,它只会附加"休息"作为价值观。我想用逗号加入值。这就是我现在所处的位置。
listfile = {}
with open("Desktop/testfile", "r") as f:
for lines in f:
lines=lines.strip()
gene=lines.split()[0]
rest = lines.split()[1:]
if gene not in listfile:
listfile[gene] = rest
#print gene, rest
else:
for items in rest:
listfile[gene].append(items)
for items in listfile.items():
print items
输入:
ACCA 39072094753 D 12
ACCA 983954875454 G 11
ACCA 098540980985 F 22
输出:
('ACCA', ['39072094753', 'D', '12', '983954875454', 'G', '11', '098540980985', 'F', '22'])
预期产出:
('ACCA', ['39072094753','983954875454','098540980985' 'D','G','F', '12','11','22'])
答案 0 :(得分:1)
这是一个适用于输入文件中任意数量列的通用解决方案:
import collections
import itertools
genes_info = collections.defaultdict(list)
with open("testfile") as genes_file:
for line in genes_file:
fields = line.split()
genes_info[fields[0]].append(fields[1:]) # Stores each row information
# Conversion of the row-first gene information into column-first information:
for gene_info in genes_info.itervalues():
gene_info[:] = itertools.chain(*zip(*gene_info))
print genes_info
给出
{'ACCA': ['39072094753', '983954875454', '098540980985', 'D', 'G', 'F', '12', '11', '22']}
(如果您需要字典而不是大致相同的默认字典,则可以在末尾添加genes_info = dict(genes_info)
。)
如果要将列值保持在一起,请使用更简单的gene_info[:] = zip(*gene_info)
。这给出了:
{'ACCA': [('39072094753', '983954875454', '098540980985'), ('D', 'G', 'F'), ('12', '11', '22')]}
实际上,zip()
基本上将行转换为列。
PS :line.split()
会自动删除空字符串,因此系统会自动删除最终换行符:我简化了原始line.strip().split()
,其中strip()
因此不必要的。
答案 1 :(得分:1)
我猜,你在每一行中都有相同数量的空格分隔值。如果没有,最长的将用于拉链。
from __future__ import print_function
import itertools
listfile = {}
with open("Desktop/testfile", "r") as f:
for line in f:
line = line.strip().split()
gene = line[0]
rest = line[1:]
if gene not in listfile:
listfile[gene] = []
listfile[gene].append(rest)
for i in listfile:
x = i.get()
print(i, list(itertools.chain(*itertools.izip_longest(*x))))
答案 2 :(得分:0)
这是你如何做到的。
openedFile = open('data.txt', 'r')
largeNumber = []
letter = []
smallNumber = []
for line in openedFile:
splittedContent = line.split()
largeNumber.append(splittedContent[1])
letter.append(splittedContent[2])
smallNumber.append(splittedContent[3])
print ('ACCA', largeNumber + letter + smallNumber)
输出:
('ACCA', ['39072094753', '983954875454', '098540980985', 'D', 'G', 'F', '12', '11', '22'])
答案 3 :(得分:-1)
如果你只需要输出的逗号分隔字符串,你可以这样做:
print ",".join(listfile.items())
我认为,为了进一步处理,将属性保存在列表中会很有用。
答案 4 :(得分:-1)
看起来是defaultdict
from from collections import defaultdict
listfile = defaultdict(lambda : [])
with open("Desktop/testfile", "r") as f:
all_lines = (l.split for l in f)
for line in all_lines:
first = line[0]
rest = line[1:]
listfile[first].extend(rest)