我正在处理具有csv格式的四列和912500行的数据。我需要将每列中的数据转换为单独的csv文件中的365列和2500行。例如。
Col1 Col2 Col3 Col4
1 33 36 38
2 25 18 56
365 -4 -3 10
366 -11 20 35
367 12 18 27 。
730 26 36 27 。
。 912500 20 37 42
期望的输出
Col1 Col2 Col3 Col4 Col5 .....Col 365
1 33 25 ...........................- 4
2 -11 12 ....................... 26
3
4 .............
5 ............ 。
2500 ............................
请告诉我如何为此编写脚本?任何帮助将受到高度赞赏。
答案 0 :(得分:0)
请按照评论中的建议尝试使用NumPy,但是,如果您想自己编写代码,可以采用以下方法:
您可以一次读取一行文件
使用逗号作为分隔符
丢弃"行数" (作为拆分操作的结果,您获得的列表的第一个元素)。您必须保持自己的行数。
答案 1 :(得分:0)
csv.reader
将创建一个逐行读取csv的迭代器。然后,您可以将其输入itertools.chain
,依次迭代每一行,输出单个列。现在您有了一个列流,您可以将它们分组为您想要的大小的新行。有几种方法可以重建这些行,我在我的示例中使用了itertools.groupby
。
import itertools
import csv
def groupby_count(iterable, count):
counter = itertools.count()
for _, grp in itertools.groupby(iterable, lambda _: next(counter)//count):
yield tuple(grp)
def reshape_csv(in_filename, out_filename, colsize):
with open(in_filename) as infile, open(out_filename, 'w') as outfile:
reader = csv.reader(infile, delimiter=' ')
writer = csv.writer(outfile, delimiter=' ')
col_iter = itertools.chain.from_iterable(reader)
writer.writerows(groupby_count(col_iter, colsize))
这是一个要测试的示例脚本。不过我使用了更少的列:
import os
infn = "intest.csv"
outfn = "outtest.csv"
orig_colsize = 4
new_colsize = 15
# test input file
with open(infn, "w") as infp:
for i in range(32):
infp.write(' '.join('c{0:02d}_{1:02d}'.format(i,j) for j in range(4)) + '\n')
# remove stale output file
try:
os.remove(outfn)
except OSError:
pass
# run it and print
reshape_csv(infn, outfn, new_colsize)
print('------- test output ----------')
print(open(outfn).read())
答案 2 :(得分:0)
以下针对假数据文件进行了测试,它对我来说没问题但是ymmv ... 请参阅内联注释以了解工作原理
import csv
# we open the data file and put its content in data, that is a list of lists
with open('data.csv') as csvfile:
data = [row for row in csv.reader(csvfile)]
# the following idiom transpose a list of lists
transpose = zip(*data)
# I use Python 3, hence zip is a generator and I have to throw away using next()
# the first element, i.e., the column of the row numbers
next(transpose)
# I enumerate transpose, obtaininig the data column by column
for nc, column in enumerate(transpose):
# I prepare for writing to a csv file
with open('trans%d.csv'%nc, 'w') as outfile:
writer = csv.writer(outfile)
# here, we have an idiom, sort of..., please see
# http://stupidpythonideas.blogspot.it/2013/08/how-grouper-works.html
# for the reason why what we enumerate are the rows of your output file
for nr, row in enumerate(zip(*[iter(column)]*365)):
writer.writerow([nr+1,*row])