我有一个包含一些字段的主表。我想加入一堆其他的csvs。
目前的数据如下:
文件1:
Key Attrib1 Attrib2 Attrib3 Attrib4
文件2:
Key Attrib5
文件3:
Key Attrib6
我希望我的最终输出看起来像:
Key Attrib1 Attrib2 Attrib3 Attrib4 Attrib5 Attrib6, etc.
并非所有文件都包含所有密钥。
当前代码:
master = "in.csv"
file1 = "file.csv"
file2 = "file2.csv"
prime = list()
D1 = {}
with open(master) as f:
for k in csv.reader(f):
prime.append(k[0])
for k in prime:
with open(file1,'r') as csvfile:
rd = csv.reader(csvfile,delimiter=",")
for row in rd:
if row[0] ==k:
D1 = dict((row[0],row[1]) for rows in rd)
with open(file2,'r') as csvfile:
rd = csv.reader(csvfile,delimiter=",")
for row in rd:
if row[0] ==k:
D1 = D1+dict((row[0],row[1]) for rows in rd)
答案 0 :(得分:1)
我认为如果不是你想要的话,这确实很接近:
master = "in.csv"
filelist = "file.csv", "file2.csv"
joined = "joined.csv"
dict1 = {}
with open(master, 'r') as csvfile:
for row in csv.reader(csvfile):
key = row[0]
dict1[key] = row[1:] # note this does not check for duplicate keys
for filename in filelist:
with open(filename, 'rb') as csvfile:
seen = set()
for row in csv.reader(csvfile):
key = row[0]
if key in dict1:
if key in seen:
print('Error: duplicate key %r in file %r - ignored' %
(key, filename))
else:
dict1[key].append(row[1])
seen.add(key)
else: # key not in master
pass # ignore
# add null entry for any keys not present in this file
for key in dict1:
if key not in seen:
dict1[key].append(None)
# write the data in the merged dictionary into a new csv file
with open(joined, 'wb') as newcsvfile:
csv.writer(newcsvfile).writerows(
([key]+attrlist) for key, attrlist in sorted(dict1.iteritems()))
答案 1 :(得分:0)
这里的想法是打开所有三个文件并将它们写入新的.csv文件。我将如何加入csv文件的一般想法是这样的:
import glob
import csv
# gets all the files in your dictionary that end with .csv
csv_files = glob.glob('*.csv')
# create the new csv file, which will be your output
with open('filename.csv', 'w') as outfile:
writer = csv.writer(outfile, delimiter = ',')
for csv_file in csv_files:
with open(csv_file) as infile:
reader = csv.reader(infile, delimiter = ',')
for row in reader:
writer.writerow(row)
你必须操纵什么" row"包括使其与您的数据的工作方式相匹配(在没有您需要的列的数据上创建空列)。
可能的解决方案是为每个文件创建一个元组格式,您可以在其中为您需要的点创建空白点。将元组写入行可以像这样工作。
for row in reader:
if csv_file == 'file1':
# '' represents a blank field in column
data_to_write = (row[0], row[1], '', row[2])
elif csv_file == 'file2':
data_to_write = '', row[0], row[1],row[2]
writer.writerow(data_to_write)