好的,我已经在Stack Overflow上阅读了几个主题。我认为这对我来说相当容易,但我发现我仍然没有很好地掌握Python。我尝试了位于How to combine 2 csv files with common column value, but both files have different number of lines的示例,这很有帮助,但我仍然没有达到我希望实现的结果。
基本上我有2个带有共同第一列的csv文件。我想合并2.即
filea.csv
title,stage,jan,feb darn,3.001,0.421,0.532 ok,2.829,1.036,0.751 three,1.115,1.146,2.921
fileb.csv
title,mar,apr,may,jun, darn,0.631,1.321,0.951,1.751 ok,1.001,0.247,2.456,0.3216 three,0.285,1.283,0.924,956
output.csv(不是我得到的,但我想要的)
title,stage,jan,feb,mar,apr,may,jun darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751 ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216 three,1.115,1.146,2.921,0.285,1.283,0.924,956
output.csv(我实际得到的输出)
title,feb,may ok,0.751,2.456 three,2.921,0.924 darn,0.532,0.951
我正在尝试的代码:
'''
testing merging of 2 csv files
'''
import csv
import array
import os
with open('Z:\\Desktop\\test\\filea.csv') as f:
r = csv.reader(f, delimiter=',')
dict1 = {row[0]: row[3] for row in r}
with open('Z:\\Desktop\\test\\fileb.csv') as f:
r = csv.reader(f, delimiter=',')
#dict2 = {row[0]: row[3] for row in r}
dict2 = {row[0:3] for row in r}
print str(dict1)
print str(dict2)
keys = set(dict1.keys() + dict2.keys())
with open('Z:\\Desktop\\test\\output.csv', 'wb') as f:
w = csv.writer(f, delimiter=',')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])
非常感谢任何帮助。
答案 0 :(得分:59)
当我使用csv
文件时,我经常使用pandas库。它使这样的事情变得非常容易。例如:
import pandas as pd
a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)
以下是一些解释。首先,我们读入csv文件:
>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
title stage jan feb
0 darn 3.001 0.421 0.532
1 ok 2.829 1.036 0.751
2 three 1.115 1.146 2.921
>>> b
title mar apr may jun Unnamed: 5
0 darn 0.631 1.321 0.951 1.7510 NaN
1 ok 1.001 0.247 2.456 0.3216 NaN
2 three 0.285 1.283 0.924 956.0000 NaN
我们看到有一个额外的数据列(注意fileb.csv
的第一行 - title,mar,apr,may,jun,
- 最后有一个额外的逗号)。我们可以很容易地摆脱它:
>>> b = b.dropna(axis=1)
>>> b
title mar apr may jun
0 darn 0.631 1.321 0.951 1.7510
1 ok 1.001 0.247 2.456 0.3216
2 three 0.285 1.283 0.924 956.0000
现在我们可以在标题列上合并a
和b
:
>>> merged = a.merge(b, on='title')
>>> merged
title stage jan feb mar apr may jun
0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510
1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216
2 three 1.115 1.146 2.921 0.285 1.283 0.924 956.0000
最后写出来:
>>> merged.to_csv("output.csv", index=False)
制造
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0
答案 1 :(得分:1)
您需要将所有额外行存储在字典中的文件中,而不仅仅是其中一行:
dict1 = {row[0]: row[1:] for row in r}
...
dict2 = {row[0]: row[1:] for row in r}
然后,由于字典中的值是列表,您需要将列表连接在一起:
w.writerows([[key] + dict1.get(key, []) + dict2.get(key, []) for key in keys])