我有许多具有可变长度行的csv文件。例如以下内容:
Time,0,8,18,46,132,163,224,238,267,303
X,0,14,14,14,15,16,17,15,15,15
Time,0,4,13,22,32,41,50,59,69,78,87,97,106,115,125,127,137,146,155,165,174,183,192,202,211,220,230,239,248,258,267,277,289,298,308
Y,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Time,0,4,13,22,32,41,50,59,69,78,87,97,106,115,125,127,137,146,155,165,174,183,192,202,211,220,230,239,248,258,267,277,289,298,308
Z,0,1,2,1,1,1,1,1,1,2,2,1,0,1,1,2,2,2,2,2,1,1,2,2,2,1,1,1,1,1,2,2,2,2,2
Time,0,308
W,0,0
变为:
Time,X,Time,Y,Time,Z,Time,W
0,0,0,0,0,0,0,0
8,14,4,0,4,1,308,0
许多数据已经丢失,只占用了前两个数据。
我想在python中转置这个CSV。我有以下程序:
import csv
import os
from itertools import izip
import sys
try:
filename = sys.argv[1]
except IndexError:
print 'Please add a filename'
exit(-1)
with open(os.path.splitext(filename)[0] + '_t.csv', 'wb') as outfile, open(filename, 'rb') as infile:
a = izip(*csv.reader(infile))
csv.writer(outfile).writerows(a)
然而,它似乎削减了大量数据,因为文件从20KB下降到6KB并且只保持最小行长度。
如何不丢弃任何数据?
答案 0 :(得分:1)
izip
根据最短的数组进行拉链,因此您只获得每行中最短数组长度的值。
你应该使用izip_longest
而不是那个,它用最长的数组拉链,并且在没有值的地方放置None。
示例 -
import csv
import os
from itertools import izip_longest
import sys
try:
filename = sys.argv[1]
except IndexError:
print 'Please add a filename'
exit(-1)
with open(os.path.splitext(filename)[0] + '_t.csv', 'wb') as outfile, open(filename, 'rb') as infile:
a = izip_longest(*csv.reader(infile))
csv.writer(outfile).writerows(a)
我从中获得了结果 -
Time,X,Time,Y,Time,Z,Time,W
0,0,0,0,0,0,0,0
8,14,4,0,4,1,308,0
18,14,13,1,13,2,,
46,14,22,1,22,1,,
132,15,32,1,32,1,,
163,16,41,1,41,1,,
224,17,50,1,50,1,,
238,15,59,1,59,1,,
267,15,69,1,69,1,,
303,15,78,1,78,2,,
,,87,1,87,2,,
,,97,1,97,1,,
,,106,1,106,0,,
,,115,1,115,1,,
,,125,1,125,1,,
,,127,1,127,2,,
,,137,1,137,2,,
,,146,1,146,2,,
,,155,1,155,2,,
,,165,1,165,2,,
,,174,1,174,1,,
,,183,1,183,1,,
,,192,1,192,2,,
,,202,1,202,2,,
,,211,1,211,2,,
,,220,1,220,1,,
,,230,1,230,1,,
,,239,1,239,1,,
,,248,1,248,1,,
,,258,1,258,1,,
,,267,1,267,2,,
,,277,1,277,2,,
,,289,1,289,2,,
,,298,1,298,2,,
,,308,1,308,2,,
答案 1 :(得分:0)
这是一种没有itertools.izip
的方法:
import csv
with open('transpose.csv') as infile, \
open('out.csv', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
while True:
try:
index = next(reader)
data = next(reader)
except StopIteration:
break
writer.writerows(zip(index, data))
根据您的输入,此代码段会生成以下out.csv
:
Time,X
568,0
573,0
577,1
581,1
585,0
590,2
594,0
599,0
603,0
Time,Y
590,0
594,3
599,3
03,0
Time,Z
599,0
603,1
这是你想要的吗?
此修改后的示例应与您更新的问题相符:
import csv
from itertools import zip_longest # izip_longest in Python 2
with open('transpose.csv') as infile, \
open('out.csv', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
writer.writerows(zip_longest(*reader, fillvalue=0))
将fillvalue
更新为您要用。替换缺失值的内容。