我有两个这样的csv文件
"id","h1","h2","h3", ...
"1","blah","blahla"
"4","bleh","bleah"
我想合并这两个文件,这样如果两个文件中都有相同的id,那么该行的值应该来自第二个文件。如果它们具有不同的ID,则合并的文件应包含两行。
有些值有逗号
"54","34,2,3","blah"
答案 0 :(得分:3)
res = {}
a=open('a.csv')
for line in a:
(id, rest) = line.split(',', 1)
res[id] = rest
a.close()
b=open('b.csv')
for line in b:
(id, rest) = line.split(',', 1)
res[id] = rest
b.close()
c=open('c.csv', 'w')
for id, rest in res.items():
f.write(id+","+rest)
f.close()
基本上,您使用每行的第一列作为字典res
中的键。因为b.csv是第二个文件,所以第一个文件(a.csv)中已存在的密钥将被覆盖。最后,在输出文件c.csv中再次合并key
和rest
。
标题行也将从第二个文件中获取,但不管怎么说,这些都不应该有所不同。
编辑:稍微不同的解决方案,合并任意数量的文件并按顺序输出行:
res = {}
files_to_merge = ['a.csv', 'b.csv']
for filename in files_to_merge:
f=open(filename)
for line in f:
(id, rest) = line.split(',', 1)
if rest[-1] != '\n': #last line may be missing a newline
rest = rest + '\n'
res[id] = rest
f.close()
f=open('c.csv', 'w')
f.write("\"id\","+res["\"id\""])
del res["\"id\""]
for id, rest in sorted(res.iteritems()):
f.write(id+","+rest)
f.close()
答案 1 :(得分:2)
保持关键顺序,并根据id
维护最后一行,您可以执行以下操作:
import csv
from collections import OrderedDict
from itertools import chain
incsv = [csv.DictReader(open(fname)) for fname in ('/home/jon/tmp/test1.txt', '/home/jon/tmp/test2.txt')]
rows = OrderedDict((row['id'], row) for row in chain.from_iterable(incsv))
for row in rows.itervalues(): # write out to new file or whatever here instead
print row
答案 2 :(得分:1)
import csv
with open("a.csv") as a:
fields = next(a)
D = {k: v for k,*v in csv.reader(a)}
with open("b.csv") as b:
next(b)
D.update({k: v for k,*v in csv.reader(b)})
with open("c.csv", "w") as c:
c.write(fields)
csv.writer(c, quoting=csv.QUOTE_ALL).writerows([k]+v for k,v in D.items())