Question

我已在多个大型csv文件上分发信息。我想将所有文件合并到一个新文件中，例如将第一个文件中的第一行合并到另一个文件中的第一行等。

file1.csv

A,B
A,C
A,D

file2.csv

F,G
H,I
J,K

预期结果：

output.csv

A,B,F,G
A,C,H,I
A,D,J,K

所以考虑我有一个数组['file1.csv', 'file2.csv', ...]如何从这里开始？

我尝试将每个文件加载到内存中并按np.column_stack合并，但我的文件太大而无法放入内存。

Answer 1

不是漂亮的代码，但这应该有效。

我没有使用with(open'filename','r') as myfile作为输入。它可能会因50个文件而变得有些混乱，因此这些文件会被明确打开和关闭。

打开每个文件，然后将句柄放在列表中。第一个句柄作为主文件，然后我们逐行遍历它，每次从所有其他打开的文件读取一行并用','连接它们然后将它输出到输出文件。 / p>

请注意，如果其他文件包含更多行，则不会包含这些行。如果有更少的行，这将引发异常。我会留给你优雅地处理这些情况。

另请注意，如果名称遵循逻辑模式，则可以使用glob创建filelist（感谢下面的N. Wouda）

filelist = ['book1.csv','book2.csv','book3.csv','book4.csv']
openfiles = []
for filename in filelist:
    openfiles.append(open(filename,'rb'))

# Use first file in the list as the master
# All files must have same number of lines (or greater)
masterfile = openfiles.pop(0) 

with (open('output.csv','w')) as outputfile:
    for line in masterfile:
        outputlist = [line.strip()]
        for openfile in openfiles:
            outputlist.append(openfile.readline().strip())
        outputfile.write(str.join(',', outputlist)+'\n')

masterfile.close()
for openfile in openfiles:
    openfile.close()

输入文件

a   b   c   d   e   f
1   2   3   4   5   6
7   8   9   10  11  12
13  14  15  16  17  18

<强>输出

a   b   c   d   e   f   a   b   c   d   e   f   a   b   c   d   e   f   a   b   c   d   e   f
1   2   3   4   5   6   1   2   3   4   5   6   1   2   3   4   5   6   1   2   3   4   5   6
7   8   9   10  11  12  7   8   9   10  11  12  7   8   9   10  11  12  7   8   9   10  11  12
13  14  15  16  17  18  13  14  15  16  17  18  13  14  15  16  17  18  13  14  15  16  17  18

Answer 2

不是将文件完全读入内存，而是可以逐行迭代它们。

from itertools import izip # like zip but gives us an iterator

with open('file1.csv') as f1, open('file2.csv') as f2, open('output.csv', 'w') as out:
    for f1line, f2line in izip(f1, f2):
        out.write('{},{}'.format(f1line.strip(), f2line))

演示：

$ cat file1.csv 
A,B
A,C
A,D
$ cat file2.csv 
F,G
H,I
J,K
$ python2.7 merge.py
$ cat output.csv 
A,B,F,G
A,C,H,I
A,D,J,K

Python将来自不同文件的行组合成一个数据文件

2 个答案: