我有两个csv文件,我想合并。
File1中:
rel_id, acc_id, value, timestamp
1, 2, True, 2016-01-04 19:20:22
2, 3, True, 2016-01-04 18:35:56
1, 2, True, 2016-01-04 20:43:12
1, 5, False, 2016-01-04 18:15:20
2, 3, True, 2016-01-04 20:43:11
文件2:
rel_id, acc_id, value, timestamp
1, 2, 250, 2016-01-04 20:43:13
1, 5, 610, 2016-01-04 18:15:23
2, 3, 400, 2016-01-04 18:35:58
2, 3, 300, 2016-01-04 20:43:13
1, 2, 500, 2016-01-04 19:20:23
我想根据rel_id,acc_id和timestamp合并这两个文件。
合并(file1和file2):
rel_id, acc_id, value_file1, timestamp, value_file2
1, 2, True, 2016-01-04 19:20:22, 500
2, 3, True, 2016-01-04 18:35:56, 400
1, 2, True, 2016-01-04 20:43:12, 250
1, 5, False, 2016-01-04 18:15:20, 610
2, 3, True, 2016-01-04 20:43:11, 300
然而,file2的时间戳稍晚一些。
在stackoverflow上搜索引导我看到这篇文章:pandas merge dataframes by closest time
但我不知道如何在最近的rel_id,acc_id和timestamp上进行匹配。
import pandas as pd
file1 = pd.read_csv('file1.csv')
file2 = pd.read_csv('file2.csv')
file1.columns = ['rel_id', 'acc_id', 'value', 'timestamp']
file2.columns = ['rel_id', 'acc_id', 'value', 'timestamp']
file1['timestamp'] = pd.to_datetime(file1['timestamp'])
file2['timestamp'] = pd.to_datetime(file2['timestamp'])
file1_dt = pd.Series(file1["timestamp"].values, file1["timestamp"])
file1_dt.reindex(file2["timestamp"], method="nearest")
file2["nearest"] = file1_dt.reindex(file2["timestamp"], method="nearest").values
print file2
我根据其他帖子尝试了上面的代码,但是这在rel_id和acc_id上还没有匹配。加上上面的代码已经引发错误:
ValueError:index必须是单调递增或递减
任何帮助都非常有用。感谢。
答案 0 :(得分:0)
您正尝试使用未排序的索引重新编制索引。 假设您的CSV没有标题:
column_names = ['rel_id', 'acc_id', 'value', 'timestamp']
file1 = pd.read_csv('file1.csv',
index_col=['timestamp'],
parse_dates='timestamp',
header=None,
names=column_names).sort_index()
file2 = pd.read_csv('file2.csv',
index_col=['timestamp'],
parse_dates='timestamp',
header=None,
names=column_names).sort_index()
file1.set_index(file1.reindex(file2.index, method='nearest').index, inplace=True)
rel_id acc_id value
timestamp
2016-01-04 18:15:23 1 5 False
2016-01-04 18:35:58 2 3 True
2016-01-04 19:20:23 1 2 True
2016-01-04 20:43:13 2 3 True
2016-01-04 20:43:13 1 2 True
合并file1和file2:
file1.reset_index().merge(file2.reset_index(), on=['acc_id', 'rel_id', 'timestamp']).set_index('timestamp')