加入多个文件字典

时间:2015-06-25 13:03:52

标签: python csv join merge python-2.5

我有一个包含一些字段的主表。我想加入一堆其他的csvs。

目前的数据如下:

文件1:

Key  Attrib1  Attrib2  Attrib3  Attrib4

文件2:

Key Attrib5

文件3:

Key Attrib6

我希望我的最终输出看起来像:

Key   Attrib1  Attrib2  Attrib3  Attrib4 Attrib5 Attrib6, etc.

并非所有文件都包含所有密钥。

当前代码:

master = "in.csv"
file1 = "file.csv"
file2 = "file2.csv"
prime = list()
D1 = {}

with open(master) as f:
    for k in csv.reader(f):
        prime.append(k[0])

for k in prime:
    with open(file1,'r') as csvfile:
        rd = csv.reader(csvfile,delimiter=",")
        for row in rd:
            if row[0] ==k:
                D1 = dict((row[0],row[1]) for rows in rd)
    with open(file2,'r') as csvfile:
        rd = csv.reader(csvfile,delimiter=",")
        for row in rd:
            if row[0] ==k:
                D1 = D1+dict((row[0],row[1]) for rows in rd)

2 个答案:

答案 0 :(得分:1)

我认为如果不是你想要的话,这确实很接近:

master = "in.csv"
filelist = "file.csv", "file2.csv"
joined = "joined.csv"
dict1 = {}

with open(master, 'r') as csvfile:
    for row in csv.reader(csvfile):
        key = row[0]
        dict1[key] = row[1:]  # note this does not check for duplicate keys

for filename in filelist:
    with open(filename, 'rb') as csvfile:
        seen = set()
        for row in csv.reader(csvfile):
            key = row[0]
            if key in dict1:
                if key in seen:
                    print('Error: duplicate key %r in file %r - ignored' %
                                   (key, filename))
                else:
                    dict1[key].append(row[1])
                    seen.add(key)
            else:  # key not in master
                pass  # ignore    

        # add null entry for any keys not present in this file
        for key in dict1:
            if key not in seen:
                dict1[key].append(None)

# write the data in the merged dictionary into a new csv file
with open(joined, 'wb') as newcsvfile:
    csv.writer(newcsvfile).writerows(
        ([key]+attrlist) for key, attrlist in sorted(dict1.iteritems()))

答案 1 :(得分:0)

这里的想法是打开所有三个文件并将它们写入新的.csv文件。我将如何加入csv文件的一般想法是这样的:

import glob
import csv

# gets all the files in your dictionary that end with .csv
csv_files = glob.glob('*.csv')

        # create the new csv file, which will be your output
        with open('filename.csv', 'w') as outfile:
                writer = csv.writer(outfile, delimiter = ',')

                for csv_file in csv_files:
                    with open(csv_file) as infile:
                        reader = csv.reader(infile, delimiter = ',')
                        for row in reader:
                            writer.writerow(row)

你必须操纵什么" row"包括使其与您的数据的工作方式相匹配(在没有您需要的列的数据上创建空列)。

可能的解决方案是为每个文件创建一个元组格式,您可以在其中为您需要的点创建空白点。将元组写入行可以像这样工作。

for row in reader:

    if csv_file == 'file1':
        # '' represents a blank field in column
        data_to_write = (row[0], row[1], '', row[2])

    elif csv_file == 'file2':
        data_to_write = '', row[0], row[1],row[2]

    writer.writerow(data_to_write)