这是我的过滤条件(向后一天),称为df
customer_id timestamp
1 2018-06-03 17:56:52
2 2018-06-03 18:42:51
这是主数据集,称为df2
transaction_id customer_id timestamp
1 1 2018-06-02 09:56:23
2 1 2018-06-03 02:56:52
3 1 2018-06-03 12:56:52
4 2 2018-06-03 12:40:51
5 2 2018-06-03 18:40:51
6 2 2018-06-03 18:48:50
我想要的是
transaction_id customer_id timestamp
2 1 2018-06-03 02:56:52
3 1 2018-06-03 12:56:52
4 2 2018-06-03 12:40:51
5 2 2018-06-03 18:40:51
这是因为对于customer_id = 1
,过滤条件应该从2018-06-02 17:56:52
开始到2018-06-03 17:56:52
和
这是因为对于customer_id = 2
,过滤条件应该从2018-06-02 18:42:51
开始到2018-06-03 18:42:51
答案 0 :(得分:2)
使用Series
创建的map
与另一个s = df2['customer_id'].map(df1.set_index('customer_id')['timestamp'])
df = df2[df2['timestamp'].between(s - pd.Timedelta(1, unit='d'), s)]
print (df)
transaction_id customer_id timestamp
1 2 1 2018-06-03 02:56:52
2 3 1 2018-06-03 12:56:52
3 4 2 2018-06-03 12:40:51
4 5 2 2018-06-03 18:40:51
减去一天,按between
过滤:
print (s)
0 2018-06-03 17:56:52
1 2018-06-03 17:56:52
2 2018-06-03 17:56:52
3 2018-06-03 18:42:51
4 2018-06-03 18:42:51
5 2018-06-03 18:42:51
Name: customer_id, dtype: datetime64[ns]
<强>详细强>:
>> A = [6 4 23 -3; 9 -10 4 11; 2 8 -5 1]
A =
6 4 23 -3
9 -10 4 11
2 8 -5 1
>> Col_step_1 = std(A, 0, 1)
Col_step_1 =
3.5119 9.4516 14.2945 7.2111
>> Col_final = std(Col_step_1)
Col_final =
4.5081
答案 1 :(得分:2)
您可以创建新时间,然后检查时间戳是否介于时间i,e
之间after = df2['customer_id'].map(df1.set_index('customer_id')['timestamp'])
before = after - pd.Timedelta('1 days')
df2[(df2['timestamp'] > before) & (df2['timestamp'] < after)]
transaction_id customer_id timestamp
1 2 1 2018-06-03 02:56:52
2 3 1 2018-06-03 12:56:52
3 4 2 2018-06-03 12:40:51
4 5 2 2018-06-03 18:40:51