我正在尝试将文件夹中的所有csv文件合并到一个大型csv文件中。我还需要向这个合并的csv添加一个新列,它显示每行来自的原始文件。这是我到目前为止的代码:
import csv
import glob
read_files = glob.glob("*.csv")
source = []
with open("combined.files.csv", "wb") as outfile:
for f in read_files:
source.append(f)
with open(f, "rb") as infile:
outfile.write(infile.read())
我知道我必须以某种方式重复每个f为每个csv中的行数,然后将其作为新列附加到.write命令,但我不知道如何执行此操作。谢谢大家!
答案 0 :(得分:5)
如果将文件名添加为最后一列,则根本不需要解析csv。只需逐行阅读,添加文件名并写入。并且不要以二进制模式打开!
import glob
import os
out_filename = "combined.files.csv"
if os.path.exists(out_filename):
os.remove(out_filename)
read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
for filename in read_files:
with open(filename) as infile:
for line in infile:
outfile.write('{},{}\n'.format(line.strip(), filename))
如果你的csv有一个共同的标题行,选择一个写入outfile并压制其余的
import os
import glob
want_header = True
out_filename = "combined.files.csv"
if os.path.exists(out_filename):
os.remove(out_filename)
read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
for filename in read_files:
with open(filename) as infile:
if want_header:
outfile.write('{},Filename\n'.format(next(infile).strip()))
want_header = False
else:
next(infile)
for line in infile:
outfile.write('{},{}\n'.format(line.strip(), filename))