合并文件夹中的所有csv并在Python中添加包含原始文件文件名的新列

时间:2016-02-05 20:18:14

标签: python csv

我正在尝试将文件夹中的所有csv文件合并到一个大型csv文件中。我还需要向这个合并的csv添加一个新列,它显示每行来自的原始文件。这是我到目前为止的代码:

import csv
import glob


read_files = glob.glob("*.csv")

source = []

with open("combined.files.csv", "wb") as outfile:
    for f in read_files:
        source.append(f)
        with open(f, "rb") as infile:
            outfile.write(infile.read())

我知道我必须以某种方式重复每个f为每个csv中的行数,然后将其作为新列附加到.write命令,但我不知道如何执行此操作。谢谢大家!

1 个答案:

答案 0 :(得分:5)

如果将文件名添加为最后一列,则根本不需要解析csv。只需逐行阅读,添加文件名并写入。并且不要以二进制模式打开!

import glob
import os

out_filename = "combined.files.csv"
if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))

如果你的csv有一个共同的标题行,选择一个写入outfile并压制其余的

import os
import glob

want_header = True
out_filename = "combined.files.csv"

if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")

with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            if want_header:
                outfile.write('{},Filename\n'.format(next(infile).strip()))
                want_header = False
            else:
                next(infile)
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))