我有很多csv文件是"列"我需要预先处理以最终索引它们。
这是面向时间的数据,每个"设备的列数非常多。 (最多128列),如:
LDEV_XXXXXX.csv
Serial number : XXXXX(VSP)
From : 2014/06/04 05:58
To : 2014/06/05 05:58
sampling rate : 1
"No.","time","00:30:00X(X2497-1)","00:30:01X(X2498-1)","00:30:02X(X2499-1)"
"242","2014/06/04 10:00",0,0,0
"243","2014/06/04 10:01",0,0,0
"244","2014/06/04 10:02",9,0,0
"245","2014/06/04 10:03",0,0,0
"246","2014/06/04 10:04",0,0,0
"247","2014/06/04 10:05",0,0,0
我的目标是将数据转换(如果它是正确的)数据到行中,这样我就能更有效地操作数据,例如:
"time",device,value
"2014/06/04 10:00","00:30:00X(X2497-1)",0
"2014/06/04 10:00","00:30:01X(X2498-1)",0
"2014/06/04 10:00","00:30:02X(X2499-1)",0
"2014/06/04 10:01","00:30:00X(X2497-1)",0
"2014/06/04 10:01","00:30:01X(X2498-1)",0
"2014/06/04 10:01","00:30:02X(X2499-1)",0
"2014/06/04 10:02","00:30:00X(X2497-1)",9
"2014/06/04 10:02","00:30:01X(X2498-1)",0
"2014/06/04 10:02","00:30:02X(X2499-1)",0
等等......
注意:我已经让原始数据(使用","作为分隔符),你会注意到我需要删除6个第一行" No"没有兴趣的专栏,但这不是主要目标和难点)
我有一个python启动代码来转置csv数据,但它并不完全符合我的需要...
import csv
import sys
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile) as f:
reader = csv.reader(f)
cols = []
for row in reader:
cols.append(row)
with open(outfile, 'wb') as f:
writer = csv.writer(f)
for i in range(len(max(cols, key=len))):
writer.writerow([(c[i] if i<len(c) else '') for c in cols])
请注意,列数是任意的,少数几个,最多128个,具体取决于文件。
我很确定这是一个常见的需求,但我还无法找到完成此操作的精确python代码,或者我无法获得......
编辑:
更精确:
每个时间戳行将按设备数重复,这样文件将有更多行(乘以设备数)但只有几行(时间戳,设备,值) 最终的预期结果已更新: - )
编辑:
我希望能够使用argument1 for infile和argument2 for outfile: - )
答案 0 :(得分:2)
首先,您应该将数据输入到您想要的结构中,然后您可以轻松地将其写出来。此外,对于具有复杂结构的csv,使用DictReader打开它通常更有用。
from csv import DictReader, DictWriter
with open(csv_path) as f:
table = list(DictReader(f, restval=''))
transformed = []
for row in table:
devices = [d for d in row.viewkeys() - {'time', 'No.'}]
time_rows = [{'time': row['time']} for i in range(len(devices))]
for i, d in enumerate(devices):
time_rows[i].update({'device': d, 'value': row[d]})
transformed += time_rows
这会生成一个类似
的列表[{'device': '00:30:00X(X2497-1)', 'value': '0', 'time': '2014/06/04 10:00'},
{'device': '00:30:02X(X2499-1)', 'value': '0', 'time': '2014/06/04 10:00'},
{'device': '00:30:01X(X2498-1)', 'value': '0', 'time': '2014/06/04 10:00'},
{'device': '00:30:00X(X2497-1)', 'value': '0', 'time': '2014/06/04 10:01'},
{'device': '00:30:02X(X2499-1)', 'value': '0', 'time': '2014/06/04 10:01'},
{'device': '00:30:01X(X2498-1)', 'value': '0', 'time': '2014/06/04 10:01'},
{'device': '00:30:00X(X2497-1)', 'value': '9', 'time': '2014/06/04 10:02'},
{'device': '00:30:02X(X2499-1)', 'value': '0', 'time': '2014/06/04 10:02'},
{'device': '00:30:01X(X2498-1)', 'value': '0', 'time': '2014/06/04 10:02'},
{'device': '00:30:00X(X2497-1)', 'value': '0', 'time': '2014/06/04 10:03'},
{'device': '00:30:02X(X2499-1)', 'value': '0', 'time': '2014/06/04 10:03'},
{'device': '00:30:01X(X2498-1)', 'value': '0', 'time': '2014/06/04 10:03'},
{'device': '00:30:00X(X2497-1)', 'value': '0', 'time': '2014/06/04 10:04'},
{'device': '00:30:02X(X2499-1)', 'value': '0', 'time': '2014/06/04 10:04'},
{'device': '00:30:01X(X2498-1)', 'value': '0', 'time': '2014/06/04 10:04'},
{'device': '00:30:00X(X2497-1)', 'value': '0', 'time': '2014/06/04 10:05'},
{'device': '00:30:02X(X2499-1)', 'value': '0', 'time': '2014/06/04 10:05'},
{'device': '00:30:01X(X2498-1)', 'value': '0', 'time': '2014/06/04 10:05'}]
这正是我们想要的。然后把它写回来你可以使用DictWriter。
# you might sort transformed here so that it gets written out in whatever order you like
column_names = ['time', 'device', 'value']
with open(out_path, 'w') as f:
writer = DictWriter(f, column_names)
writer.writeheader()
writer.writerows(transformed)
答案 1 :(得分:1)
编辑:期望引用("
)在No.
附近,端口代码到python 2并显示python 3并删除调试print
EDIT2:修复了不增加索引的愚蠢错误
EDIT3:新版本允许输入文件包含多个标题,每个标题后跟数据
我不确定使用csv
模块是否值得,因为您的分隔符已修复,您没有引号,并且没有包含换行符或分隔符的字段:line.strip.split(',')
就足够了。
以下是我的尝试:
python 2的代码(删除python 3的第一行from __future__ import print_function
)
from __future__ import print_function
class transposer(object):
def _skip_preamble(self):
for line in self.fin:
if line.strip().startswith('"No."'):
self.keys = line.strip().split(',')[2:]
return
raise Exception('Initial line not found')
def _do_loop(self):
for line in self.fin:
elts = line.strip().split(',')
dat = elts[1]
ix = 0
for val in elts[2:]:
print(dat, self.keys[ix], val, sep=',', file = self.out)
ix += 1
def transpose(self, ficin, ficout):
with open(ficin) as fin:
with open(ficout, 'w') as fout:
self.do_transpose(fin, fout)
def do_transpose(self, fin, fout):
self.fin = fin
self.out = fout
self._skip_preamble()
self._do_loop()
用法:
t = transposer()
t.transpose('in', 'out')
如果输入文件包含多个标题,则需要重置每个标题上的键列表:
from __future__ import print_function
class transposer(object):
def _do_loop(self):
line_number = 0
for line in self.fin:
line_number += 1
line = line.strip();
if line.strip().startswith('"No."'):
self.keys = line.strip().split(',')[2:]
elif line.startswith('"'):
elts = line.strip().split(',')
if len(elts) == (len(self.keys) + 2):
dat = elts[1]
ix = 0
for val in elts[2:]:
print(dat, self.keys[ix], val, sep=',', file = self.out)
ix += 1
else:
raise Exception("Syntax error line %d expected %d values found %d"
% (line_number, len(self.keys), len(elts) - 2))
def transpose(self, ficin, ficout):
with open(ficin) as fin:
with open(ficout, 'w') as fout:
self.do_transpose(fin, fout)
def do_transpose(self, fin, fout):
self.fin = fin
self.out = fout
self.keys = []
self._do_loop()