如何合并两个csv文件?

时间:2013-05-15 11:01:33

标签: python csv

我有两个这样的csv文件

"id","h1","h2","h3", ...
"1","blah","blahla"
"4","bleh","bleah"

我想合并这两个文件,这样如果两个文件中都有相同的id,那么该行的值应该来自第二个文件。如果它们具有不同的ID,则合并的文件应包含两行。


有些值有逗号

"54","34,2,3","blah"

3 个答案:

答案 0 :(得分:3)

res = {}

a=open('a.csv')
for line in a:
    (id, rest) = line.split(',', 1)
    res[id] = rest
a.close()

b=open('b.csv')
for line in b:
    (id, rest) = line.split(',', 1)
    res[id] = rest
b.close()

c=open('c.csv', 'w')
for id, rest in res.items():
    f.write(id+","+rest)
f.close()

基本上,您使用每行的第一列作为字典res中的键。因为b.csv是第二个文件,所以第一个文件(a.csv)中已存在的密钥将被覆盖。最后,在输出文件c.csv中再次合并keyrest

标题行也将从第二个文件中获取,但不管怎么说,这些都不应该有所不同。

编辑:稍微不同的解决方案,合并任意数量的文件并按顺序输出行:

res = {}
files_to_merge = ['a.csv', 'b.csv']
for filename in files_to_merge:
    f=open(filename)
    for line in f:
        (id, rest) = line.split(',', 1)
        if rest[-1] != '\n': #last line may be missing a newline
            rest = rest + '\n'
        res[id] = rest
    f.close()

f=open('c.csv', 'w')
f.write("\"id\","+res["\"id\""])
del res["\"id\""]
for id, rest in sorted(res.iteritems()):
    f.write(id+","+rest)
f.close()

答案 1 :(得分:2)

保持关键顺序,并根据id维护最后一行,您可以执行以下操作:

import csv
from collections import OrderedDict
from itertools import chain

incsv = [csv.DictReader(open(fname)) for fname in ('/home/jon/tmp/test1.txt', '/home/jon/tmp/test2.txt')]
rows = OrderedDict((row['id'], row) for row in chain.from_iterable(incsv))
for row in rows.itervalues(): # write out to new file or whatever here instead
    print row

答案 2 :(得分:1)

Python3

import csv

with open("a.csv") as a:
    fields = next(a)
    D = {k: v for k,*v in csv.reader(a)}

with open("b.csv") as b:
    next(b)
    D.update({k: v for k,*v in csv.reader(b)})

with open("c.csv", "w") as c:
    c.write(fields)
    csv.writer(c, quoting=csv.QUOTE_ALL).writerows([k]+v for k,v in D.items())