我在名为path的目录中有几个列csv文件,我想将所有这些列合并到一个文件中,并将该文件作为out.csv
存储在名为repsim
的目录中。
这是我的代码:
假设我已经列出了files
和fin=files[0]
def ajouter (fin, files, out, path, repsim, delim=';'):
fic=os.path.join(path,fin)
with open(fic, 'rb') as fr:
print fic + " est overt"
tout=[]
for i in range(1, len(files)):
fil=files[i]
print fil + " en cours -------------"
f=os.path.join(path,fil)
with open(f, 'rb') as fi:
fr_reader = csv.reader(fr, delimiter=delim)
fi_reader = csv.reader(fi, delimiter=delim)
for row1, row2 in zip(fr_reader, fi_reader):
row2.append(row1[-1])
tout.append(row2)
fout=os.path.join(repsim,out)
with open(fout, 'ab') as output:
writer = csv.writer(output, delimiter=delim)
writer.writerows(tout)
此代码只给我一个两列文件,其中包含files[0]
列和文件中最后一个文件的列。
答案 0 :(得分:0)
虽然这对您来说可能是一项有趣的练习,但仍需要重新发明轮子。这是直截了当的使用例如pandas:
import pandas as pd
dataframes = [pd.read_csv(p) for p in ("data1.csv", "data2.csv")]
merged_dataframe = pd.concat(dataframes, axis=1)
merged_dataframe.to_csv("merged.csv", index=False)
输入:
$ cat data1.csv
data1
a
b
c
d
$ cat data2.csv
data2
p
q
r
s
输出:
$ cat merged.csv
data1,data2
a,p
b,q
c,r
d,s
答案 1 :(得分:0)
itertools.izip
非常适合此类事情,因为它不需要您将文件读入内存。它的工作方式与zip
类似,只是它返回一个迭代器而不是一个元组列表。以下内容应该有效。
import csv
import os
from itertools import izip, chain
def ajouter(files, out, path, repsim, delim=';'):
try:
open_files = [open(os.path.join(path, file_)) for file_ in files]
readers = (csv.reader(f, delimiter=delim) for f in open_files)
merged_cols = (tuple(chain.from_iterable(row)) for row in izip(*readers))
with open(os.path.join(repsim, out), 'wb') as output:
writer = csv.writer(output, delimiter=delim)
writer.writerows(merged_cols)
finally:
for open_file in open_files:
open_file.close()