将单列csv文件合并到单个csv文件中

时间:2017-10-18 13:04:05

标签: python csv

我在这里已经看到了这类问题的一些答案,但还不足以真正帮助我。我在一个9列的.csv文件&将它们写入矢量用于c ++中的其他工作。随后将它们作为单列.csv文件写回文件夹,基本上与此类似:

date
20171012
20171011
20171010
20171009
20171006
20171005
20171004

现在我想将所有这9个简单的csv文件再次组合成1个文件,以便它们彼此水平堆叠,就像在新文件中一样:

date,value,etc...     
20171012,2501593,etc..
20171011,2176309,etc..
20171010,3484064,etc..
20171009,1785852,etc..
20171006,1785852,etc..
20171005,16476641,etc..
20171004,1235406,etc..

我希望这很容易理解。我的代码如下:

import csv
data = [] # Buffer list
files = ['./CalculatedOutput/quote_date.csv', './CalculatedOutput/paper.csv', './CalculatedOutput/exch.csv', './CalculatedOutput/open.csv', './CalculatedOutput/high.csv', './CalculatedOutput/low.csv', './CalculatedOutput/close.csv', './CalculatedOutput/volume.csv', './CalculatedOutput/value.csv']

for filename in files:
    with open(filename, 'r') as csvfile:
        stocks = csv.reader(csvfile)
        for row in stocks:
            new_row = [row[0]]
            data.append(new_row)
        with open("CalculatedOutput/Opera.csv", "w+") as to_file:
            writer = csv.writer(to_file , delimiter=",")
            for new_row in data:
                writer.writerow(new_row)

此代码将列的所有行移动到一个新文件中,但它只是将它们放在另一个下面。我怎样才能将列彼此相邻,逗号分隔? 根据concat,merge和其他人的说法,我已经尝试过广泛使用Pandas,numpy和csv lib,但我无法找到正确的方法。我不认为我离这么远,但不幸的是我的蟒蛇不是最好的!

2 个答案:

答案 0 :(得分:3)

您可以使用带有contextlib.ExitStack的单个上下文管理器(在Python 3中)打开所有文件,然后在 iterable 上应用zip后写入输出文件文件:

import csv
from contextlib import ExitStack

outfile = "CalculatedOutput/Opera.csv"
with ExitStack() as stack, open(outfile, "w+") as to_file:
    # open all files
    fs = [stack.enter_context(open(fname)) for fname in files]
    fs = map(csv.reader, fs)
    # write all rows from all files
    csv.writer(to_file).writerows(zip(*fs))

<强>更新

如果文件包含无法解码为UTF-8(open的默认编码)的字符,则可以在阅读时使用中间代理字符,在写入时将其替换为原始格式:

with ExitStack() as stack, open(outfile, "w+", errors='surrogateescape') as to_file :
    fs = [stack.enter_context(open(fname, errors='surrogateescape')) for fname in files]
    ...

答案 1 :(得分:1)

我读过你尝试过的熊猫,那里出了什么问题?使用pandas,我们可以简单地使用pd.concat([df1,df2 ....])。所以,让我们把它们读出来并将它们捆在一起:

import pandas as pd

df = pd.concat((pd.read_csv(f) for f in files),axis=1) # axis1 for horizontal
df.to_csv("CalculatedOutput/Opera.csv",index=False)

示例:

让我们先创建两个虚构文件:

file1 = """date
20171012
20171011
20171010
20171009
20171006
20171005
20171004"""

file2 = """number
1
2
3
4
5
6
7"""

files = [io.StringIO(f) for f in [file1,file2]]

import pandas as pd

df = pd.concat([pd.read_csv(f) for f in files],axis=1)

print(df)
       date  number
0  20171012       1
1  20171011       2
2  20171010       3
3  20171009       4
4  20171006       5
5  20171005       6
6  20171004       7