Question

我有一个包含四个CSV文件的文件夹。在每个CSV中都有动物，并且每只动物都有多次出现。我正在尝试创建一个CSV，用于从文件夹中的所有CSV中收集信息，删除重复项，并添加第三列，列出找到动物的原始文件。例如lion,4,'file2, file4' < BR />

我真的希望我的新CSV有第三列，列出哪些文件包含每只动物，但我无法弄清楚。我尝试用第二个字典来做 - 用locationCount引用行。请查看下面我正在使用的当前脚本。

我的文件：

file1.csv:
cat,1
dog,2
bird,1
rat,3

file2.csv:
bear,1
lion,1
goat,1
pig,1

file3.csv:
rat,1
bear,1
mouse,1
cat,1

file4.csv:
elephant,1
tiger,2
dog,1
lion,3

当前脚本：

import glob
import os
import csv, pdb

listCSV = glob.glob('*.csv')
masterCount = {}
locationCount = {}
for i in listCSV: # iterate over each csv
    filename = os.path.split(i)[1] # filename for each csv
    with open(i, 'rb') as f:
        reader = csv.reader(f)
        location = []
        for row in reader:
            key = row[0]
            location.append(filename)
            masterCount[key] = masterCount.get(key, 0) + int(row[1]) 
            locationCount[key] = locationCount.get(key, location)
writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))
for key, value in masterCount.items():
    writer.writerow([key, value])

Answer 1

你几乎是对的 - 以处理计数的方式处理地点。

我已经重命名并改组了一些东西，但它基本上是相同的代码结构。 masterCount为以前的数字添加一个数字，masterLocations将文件名添加到以前的文件名列表中。

from glob import glob
import os, csv, pdb

masterCount = {}
masterLocations = {}

for i in glob('*.csv'):
    filename = os.path.split(i)[1]

    for animal, count in csv.reader(open(i)):
        masterCount[animal] = masterCount.get(animal, 0) + int(count) 
        masterLocations[animal] = masterLocations.get(animal, []) + [filename]

writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))

for animal in masterCount.keys():
    writer.writerow([animal, masterCount[animal], ', '.join(masterLocations[animal])])

从多个CSV文件中提取信息，使用第三列写入新CSV

1 个答案: