比较两个csv文件(f1和f2)中的值并在第二个文件中进行更新(f2)

时间:2014-05-30 08:19:11

标签: shell csv awk

嗨朋友之前可能会问这个问题,但是对同一个文件进行更改对我来说有点乏味。这里的数据每秒都会添加不同的参数。我尝试使用awk sed { {1}}但不知道要使用哪种技术。所以这是我的示例文件和逻辑。

File1 f1.csv

python

当前(File2 f2.csv)

P,       V,     TS
p1,     12,     10:10:00
p2,     34,     10:21:00
p1,     12,     10:21:00
p2,     34,     10:22:00   
p3,     60,     10:36:00
p1,     60,     10:35:00
p4,     22,     10:38:00
p1,     60,     10:40:00

#Output可以在更改后打印在同一个文件中(f2.csv),或者你可以为输出创建第三个文件(f3.csv),引用文件f2.csv

预期(文件f2.csv)/输出文件(文件f3)

P,      V,      RTS,         UTS
p1,     12,    10:00:00,    10:10:00    
p2,     34,    10:18:00,    10:20:00
p1,     54,    10:20:00,    10:21:00
p2,     54,    10:22:00,    10:24:00
p3,     60,    10:31:00,    10:31:00

逻辑(伪代码)

P,      V,     RTS,         UTS
p1      12    10:10:00      10:21:00
p2      34    10:18:00      10:22:00
p1      54    10:20:00      10:21:00
p2      54    10:22:00      10:24:00
p3      60    10:31:00      10:36:00
p1      60    10:35:00      10:40:00   
p4      22    10:38:00      10:38:00   

2 个答案:

答案 0 :(得分:0)

我还没有理解你脚本的用途,所以我会建议你一个与你的伪代码完全匹配的脚本。

让我们说你的数据文件是这样写的:

> File f1.csv
    # P, V, TS
    p1, 12, 10:10:00
    p1, 22, 10:15:00
    p2, 34, 10:20:00
    p1, 54, 10:21:00
    p2, 54, 10:22:00
    p4, 54, 10:25:00
    p3, 60, 10:31:00
    p1, 45, 10:35:00

> File f2.csv
    # P, V, RTS, UTS
    p1, 12, 10:00:00, 10:10:00
    p1, 22, 10:15:00, 10:15:00
    p2, 34, 10:18:00, 10:20:00
    p1, 54, 10:20:00, 10:21:00
    p2, 54, 10:22:00, 10:24:00
    p4, 54, 10:25:00, 10:26:00
    p3, 60, 10:31:00, 10:31:00
    p4, 45, 10:35:00, 10:35:00

您正在寻找的脚本如下:

import numpy as np

fn1 = './f1.csv'
fn2 = './f2.csv'

# genfromtxt loads the file and understands the written format.
# In this case, it is more suitable than loadtxt.
# The file is loaded as an array of dictionaries.
t1 = np.genfromtxt(fn1, delimiter=',', comments="#",
                   names=True, dtype=None)

t2 = np.genfromtxt(fn2, delimiter=',', comments="#",
                   names=True, dtype=None)

# Well, here, you can write the conditions you want to modify the table t2
# that will be saved in the original file fn2
for i in xrange(min(len(t1), len(t2))):
    if (t1[i]['P'] == t2[i]['P'] and t1[i]['V'] == t2[i]['V']):
        t2[i]['UTS'] = t1[i]['TS']
    else:
        t2[i]['RTS'] = t2[i]['UTS'] = t1[i]['TS']

# If your are using one of the last versions of Python, you might replace
# the following three lines with only one, using the header argument:
# np.savetxt(fn2, t2, fmt=('%s', ' %d', '%s', '%s'), delimiter=',',
#            header="# P, V, RTS, UTS\n")
with open(fn2, 'wb') as f:
    f.write("# P, V, RTS, UTS\n")
    np.savetxt(f, t2, fmt=('%s', ' %d', '%s', '%s'), delimiter=',')

答案 1 :(得分:0)

首先,您可以从

简化伪代码
If (P,V from f1)==(P,V from f2)
{
    UTS from f2=TS from f1
}
elseif((P,V from f1)!=(P,V from f2))
{
    RTS from f2=TS from f1
    UTS from f2=TS from f1
} 

为:

UTS from f2=TS from f1 # this is always executed anyway
If((P,V from f1)!=(P,V from f2))
{
    RTS from f2=TS from f1
}

接下来,您必须删除csv文件上列之间的空格(至少在标题上)。这需要是因为否则csv.DictReader会将空间作为标题名称的一部分加载,这不是很好。 所以你有以下格式的文件:

<强> f1.csv

P,V,TS
p1,12,10:10:00
p2,34,10:20:00
p1,54,10:21:00
p2,54,10:22:00
p3,60,10:31:00

<强> f2.csv

P,V,RTS,UTS
p1,12,10:00:00,10:10:00
p2,34,10:18:00,10:20:00
p1,54,10:20:00,10:21:00
p2,54,10:22:00,10:24:00
p3,60,10:31:00,10:31:00

然后在python中你可以使用csv模块:

import csv

f2_output = []
with open('f1.csv', 'rb') as f1:
    with open('f2.csv', 'rb') as f2:
        f1_reader = csv.DictReader(f1, delimiter = ',')
        f2_reader = csv.DictReader(f2, delimiter = ',')
        for f1row in f1_reader:
            try:
                f2row = f2_reader.next()
            except StopIteration:
                # basic check to ensure amount of rows is the same
                raise Exception('Too many rows on f1!')
            #import pdb
            #pdb.set_trace()
            f2row['UTS'] = f1row['TS']
            if f1row['P'] == f2row['P'] and f1row['V'] == f2row['V']:
                f2row['UTS'] = f1row['TS']
            f2_output.append(f2row)

        # basic check ensure that all rows on f2 were processed
        try:
            _ = f2_reader.next()
            raise Exception('Too many rows on f2!')
        except StopIteration:
            pass
header = ['P', 'V', 'RTS', 'UTS']
header_dict = dict([(x, x) for x in header])
with open('f2_out.csv', 'w') as f2_out:
    f2_out_writer = csv.DictWriter(f2_out, header, delimiter = ',')
    # if you have python >= 2.7, you can use writeheader() here
    f2_out_writer.writerow(header_dict)
    f2_out_writer.writerows(f2_output)