Merging several csv files and storing the file names as a variable - Python

时间:2016-08-23 15:34:05

标签: python csv

I am trying to append several csv files into a single csv file using python while adding the file name (or, even better, a sub-string of the file name) as a new variable. All files have headers. The following script does the trick of merging the files, but does not cover the file name as variable issue:

import glob

filenames=glob.glob("/filepath/*.csv")

outputfile=open("out.csv","a")

for line in open(str(filenames[1])):
 outputfile.write(line)

for i in range(1,len(filenames)):
  f = open(str(filenames[i]))
  f.next() 
  for line in f:
     outputfile.write(line)

outputfile.close()

I was wondering if there are any good suggestions. I have about 25k small size csv files (less than 100KB each).

2 个答案:

答案 0 :(得分:0)

简单的更改将实现您的目标: 对于第一行

outputfile.write(line) -> outputfile.write(line+',file')

以后

outputfile.write(line+','+filenames[i])

答案 1 :(得分:0)

您可以使用Python的csv模块为您解析CSV文件,并格式化输出。示例代码(未经测试):

import csv

with open(output_filename, "wb") as outfile:
    writer = None
    for input_filename in filenames:
        with open(input_filename, "rb") as infile:
            reader = csv.DictReader(infile)
            if writer is None:
                field_names = ["Filename"] + reader.fieldnames
                writer = csv.DictWriter(outfile, field_names)
                writer.writeheader()
            for row in reader:
                row["Filename"] = input_filename
                writer.writerow(row)

一些注意事项:

  • 始终使用with打开文件。这样可以确保在完成它们后它们会再次关闭。您的代码没有正确关闭输入文件。
  • 应以二进制模式打开CSV文件。
  • 指数在Python中从0开始。您的代码会跳过第一个文件,并将第二个文件中的行包括两次。如果你只想迭代一个列表,你就不需要在Python中使用索引了。只需使用for x in my_list代替。