我有一个100分的词典列表如下:
datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12
我想将列表打印到文件中,如下所示:
a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12
这里提到a,b,c ... z只是例如事先不知道实际的单词数,所以单词的总数不是26,它可以是1000/10000和a,b, ......将被替换为真实的单词,如'my','hi','tote'......等等。
我一直在考虑尝试如下:
但是这个方法在python中看起来很复杂。在python中有没有更好的方法呢?
答案 0 :(得分:1)
如果你不太关心列对齐的繁琐位置,这也不错:
datapoints = [{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12}]
# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))
with open("outfile.txt", "wb") as fp:
# write the header
fp.write("{}\n".format(' '.join([" "] + keys)))
# loop over each point, getting the values in order (or 0 if they're absent)
for i, datapoint in enumerate(datapoints):
out = '{} {}\n'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
fp.write(out)
产生
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
正如评论中所提到的,pandas解决方案也很不错:
>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
如果你真的需要很好地对齐列,那么也有办法做到这一点,但我从来不需要它们所以我对它们知之甚少。
答案 1 :(得分:0)
<强>程序:强>
data_points = [
{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12},
{'e': 3, 'f': 6, 'g': 3}
]
merged_data_points = {
}
for data_point in data_points:
for k, v in data_point.items():
if k not in merged_data_points:
merged_data_points[k] = []
merged_data_points[k].append(v)
# print the merged datapoints
print '{'
for k in merged_data_points:
print ' {0}: {1},'.format(k, merged_data_points[k])
print '}'
<强>输出:强>
{
a: [1, 2],
c: [6, 9],
b: [2],
e: [3],
d: [8, 1],
g: [3],
f: [6],
p: [10],
z: [12],
}