将数据从一个csv写入另一个python

时间:2016-04-08 21:43:18

标签: python

我有三个CSV文件,其属性为Product_ID,Name,Cost,Description。每个文件都包含Product_ID。我想将Name(file1),Cost(file2),Description(File3)与Product_ID和以上所有三个属性的新CSV文件组合在一起。我需要高效的代码,因为文件包含超过130000行。

将所有数据合并到新文件后,我必须将这些数据加载到字典中。 例如:Product_Id作为键和名称,成本,描述为值。

2 个答案:

答案 0 :(得分:1)

在创建聚合结果之前,将每个输入.csv读入字典可能更有效。

这是一个解决方案,用于读取每个文件并将列存储在以Product_IDs作为键的字典中。我假设每个文件中都存在每个Product_ID值,并且包含了标头。我还假设除了Product_ID之外,文件中没有重复的列。

import csv
from collections import defaultdict

entries = defaultdict(list)
files = ['names.csv', 'costs.csv', 'descriptions.csv']
headers = ['Product_ID']

for filename in files:
   with open(filename, 'rU') as f:      # Open each file in files.
      reader = csv.reader(f)            # Create a reader to iterate csv lines
      heads = next(reader)              # Grab first line (headers)

      pk = heads.index(headers[0])      # Get the position of 'Product_ID' in
                                        # the list of headers
      # Add the rest of the headers to the list of collected columns (skip 'Product_ID')
      headers.extend([x for i,x in enumerate(heads) if i != pk])

      for row in reader:
         # For each line, add new values (except 'Product_ID') to the
         # entries dict with the line's Product_ID value as the key
         entries[row[pk]].extend([x for i,x in enumerate(row) if i != pk])

writer = csv.writer(open('result.csv', 'wb'))    # Open file to write csv lines
writer.writerow(headers)                         # Write the headers first
for key, value in entries.items():
   writer.writerow([key] + value)      # Write the product IDs
   # concatenated with the other values

答案 1 :(得分:0)

为遇到处理3个文件的每个id生成记录(可能不完整)的一般解决方案需要使用专门的数据结构,幸运的是它只是一个列表,具有预先指定的插槽数

d = {id:[name,None,None] for id, name in [line.strip().split(',') for line in open(fn1)]}
for line in open(fn2):
    id, cost = line.strip().split(',')
    if id in d:
        d[id][1] = cost
    else:
        d[id] = [None, cost, None]
for line in open(fn3):
    id, desc = line.strip().split(',')
    if id in d:
        d[id][2] = desc
    else:
        d[id] = [None, None, desc]

for id in d:
    if all(d[id]): 
       print ','.join([id]+d[id])
    else: # for this id you have not complete info,
          # so you have to decide on your own what you want, I have to
        pass

如果您确定不想进一步处理不完整记录,可以简化上述代码

d = {id:[name] for id, name in [line.strip().split(',') for line in open(fn1)]}
for line in open(fn2):
    id, cost = line.strip().split(',')
    if id in d: d[id].append(name)
for line in open(fn3):
    id, desc = line.strip().split(',')
    if id in d: d[id].append(desc)

for id in d:
    if len(d[id])==3: print ','.join([id]+d[id])