嗨朋友之前可能会问这个问题,但是对同一个文件进行更改对我来说有点乏味。这里的数据每秒都会添加不同的参数。我尝试使用awk
sed
{ {1}}但不知道要使用哪种技术。所以这是我的示例文件和逻辑。
File1 f1.csv
python
当前(File2 f2.csv)
P, V, TS
p1, 12, 10:10:00
p2, 34, 10:21:00
p1, 12, 10:21:00
p2, 34, 10:22:00
p3, 60, 10:36:00
p1, 60, 10:35:00
p4, 22, 10:38:00
p1, 60, 10:40:00
#Output可以在更改后打印在同一个文件中(f2.csv),或者你可以为输出创建第三个文件(f3.csv),引用文件f2.csv
预期(文件f2.csv)/输出文件(文件f3)
P, V, RTS, UTS
p1, 12, 10:00:00, 10:10:00
p2, 34, 10:18:00, 10:20:00
p1, 54, 10:20:00, 10:21:00
p2, 54, 10:22:00, 10:24:00
p3, 60, 10:31:00, 10:31:00
逻辑(伪代码)
P, V, RTS, UTS
p1 12 10:10:00 10:21:00
p2 34 10:18:00 10:22:00
p1 54 10:20:00 10:21:00
p2 54 10:22:00 10:24:00
p3 60 10:31:00 10:36:00
p1 60 10:35:00 10:40:00
p4 22 10:38:00 10:38:00
答案 0 :(得分:0)
我还没有理解你脚本的用途,所以我会建议你一个与你的伪代码完全匹配的脚本。
让我们说你的数据文件是这样写的:
> File f1.csv
# P, V, TS
p1, 12, 10:10:00
p1, 22, 10:15:00
p2, 34, 10:20:00
p1, 54, 10:21:00
p2, 54, 10:22:00
p4, 54, 10:25:00
p3, 60, 10:31:00
p1, 45, 10:35:00
> File f2.csv
# P, V, RTS, UTS
p1, 12, 10:00:00, 10:10:00
p1, 22, 10:15:00, 10:15:00
p2, 34, 10:18:00, 10:20:00
p1, 54, 10:20:00, 10:21:00
p2, 54, 10:22:00, 10:24:00
p4, 54, 10:25:00, 10:26:00
p3, 60, 10:31:00, 10:31:00
p4, 45, 10:35:00, 10:35:00
您正在寻找的脚本如下:
import numpy as np
fn1 = './f1.csv'
fn2 = './f2.csv'
# genfromtxt loads the file and understands the written format.
# In this case, it is more suitable than loadtxt.
# The file is loaded as an array of dictionaries.
t1 = np.genfromtxt(fn1, delimiter=',', comments="#",
names=True, dtype=None)
t2 = np.genfromtxt(fn2, delimiter=',', comments="#",
names=True, dtype=None)
# Well, here, you can write the conditions you want to modify the table t2
# that will be saved in the original file fn2
for i in xrange(min(len(t1), len(t2))):
if (t1[i]['P'] == t2[i]['P'] and t1[i]['V'] == t2[i]['V']):
t2[i]['UTS'] = t1[i]['TS']
else:
t2[i]['RTS'] = t2[i]['UTS'] = t1[i]['TS']
# If your are using one of the last versions of Python, you might replace
# the following three lines with only one, using the header argument:
# np.savetxt(fn2, t2, fmt=('%s', ' %d', '%s', '%s'), delimiter=',',
# header="# P, V, RTS, UTS\n")
with open(fn2, 'wb') as f:
f.write("# P, V, RTS, UTS\n")
np.savetxt(f, t2, fmt=('%s', ' %d', '%s', '%s'), delimiter=',')
答案 1 :(得分:0)
首先,您可以从
简化伪代码If (P,V from f1)==(P,V from f2)
{
UTS from f2=TS from f1
}
elseif((P,V from f1)!=(P,V from f2))
{
RTS from f2=TS from f1
UTS from f2=TS from f1
}
为:
UTS from f2=TS from f1 # this is always executed anyway
If((P,V from f1)!=(P,V from f2))
{
RTS from f2=TS from f1
}
接下来,您必须删除csv文件上列之间的空格(至少在标题上)。这需要是因为否则csv.DictReader
会将空间作为标题名称的一部分加载,这不是很好。
所以你有以下格式的文件:
<强> f1.csv 强>
P,V,TS
p1,12,10:10:00
p2,34,10:20:00
p1,54,10:21:00
p2,54,10:22:00
p3,60,10:31:00
<强> f2.csv 强>
P,V,RTS,UTS
p1,12,10:00:00,10:10:00
p2,34,10:18:00,10:20:00
p1,54,10:20:00,10:21:00
p2,54,10:22:00,10:24:00
p3,60,10:31:00,10:31:00
然后在python中你可以使用csv
模块:
import csv
f2_output = []
with open('f1.csv', 'rb') as f1:
with open('f2.csv', 'rb') as f2:
f1_reader = csv.DictReader(f1, delimiter = ',')
f2_reader = csv.DictReader(f2, delimiter = ',')
for f1row in f1_reader:
try:
f2row = f2_reader.next()
except StopIteration:
# basic check to ensure amount of rows is the same
raise Exception('Too many rows on f1!')
#import pdb
#pdb.set_trace()
f2row['UTS'] = f1row['TS']
if f1row['P'] == f2row['P'] and f1row['V'] == f2row['V']:
f2row['UTS'] = f1row['TS']
f2_output.append(f2row)
# basic check ensure that all rows on f2 were processed
try:
_ = f2_reader.next()
raise Exception('Too many rows on f2!')
except StopIteration:
pass
header = ['P', 'V', 'RTS', 'UTS']
header_dict = dict([(x, x) for x in header])
with open('f2_out.csv', 'w') as f2_out:
f2_out_writer = csv.DictWriter(f2_out, header, delimiter = ',')
# if you have python >= 2.7, you can use writeheader() here
f2_out_writer.writerow(header_dict)
f2_out_writer.writerows(f2_output)