我有两个dfs,如下所示。我想从PersonDate_List中删除userid的所有行,其日期时间小于df_userid_date中相同用户ID的min('datetime')。
PersonDate_List (cols={'userid','datetime'})
userid datetime Score
AB-4243 2/1/2016 0
AB-4243 2/2/2016 0
AB-4243 2/3/2016 0
AB-4243 2/4/2016 0
AB-4243 2/5/2016 0
AB-4243 2/6/2016 76
AB-4243 2/7/2016 84
AB-4243 2/8/2016 84
AB-4243 2/9/2016 81
AB-4243 2/10/2016 79
NP-7585 2/1/2016 22
NP-7585 2/2/2016 23.5
NP-7585 2/3/2016 30.15
NP-7585 2/4/2016 30.15
NP-7585 2/5/2016 30.15
NP-7585 2/6/2016 30.15
NP-7585 2/7/2016 0
NP-7585 2/8/2016 0
NP-7585 2/9/2016 22.5
NP-7585 2/10/2016 45.67
VX-4376 2/1/2016 0
VX-4376 2/2/2016 0
VX-4376 2/3/2016 0
VX-4376 2/4/2016 0
VX-4376 2/5/2016 0
VX-4376 2/6/2016 0
VX-4376 2/7/2016 0
VX-4376 2/8/2016 0
VX-4376 2/9/2016 0
VX-4376 2/10/2016 33.13
df_userid_date (cols={'userid','datetime'})
userid datetime
AB-4243 2/6/2016
AB-4243 2/7/2016
AB-4243 2/9/2016
AB-4243 2/10/2016
NP-7585 2/1/2016
NP-7585 2/2/2016
NP-7585 2/3/2016
NP-7585 2/7/2016
NP-7585 2/8/2016
NP-7585 2/9/2016
NP-7585 2/10/2016
VX-4376 2/10/2016
我正在寻找下面的结果;
userid datetime Score
AB-4243 2/6/2016 76
AB-4243 2/7/2016 84
AB-4243 2/8/2016 84
AB-4243 2/9/2016 81
AB-4243 2/10/2016 79
NP-7585 2/1/2016 22
NP-7585 2/2/2016 23.5
NP-7585 2/3/2016 30.15
NP-7585 2/4/2016 30.15
NP-7585 2/5/2016 30.15
NP-7585 2/6/2016 30.15
NP-7585 2/7/2016 0
NP-7585 2/8/2016 0
NP-7585 2/9/2016 22.5
NP-7585 2/10/2016 45.67
VX-4376 2/10/2016 33.13
我尝试在df_userid_date中添加一个最小日期标志然后合并它,但我无法在这里得到条件。
答案 0 :(得分:2)
尝试这种方式:
import numpy as np
df1 = pd.read_csv('PersonDate.csv')
df2 = pd.read_csv('useriddate.csv')
df1['datetime'] = pd.to_datetime(df1['datetime'])
df2['datetime'] = pd.to_datetime(df2['datetime'])
df3 = df1.merge(df2.groupby('userid',as_index=False).agg({'datetime' : np.min}), on='userid')
df3[df3["datetime_x"]>=df3["datetime_y"]]
输出:
userid datetime_x Score datetime_y
5 AB-4243 2016-02-06 76.00 2016-02-06
6 AB-4243 2016-02-07 84.00 2016-02-06
7 AB-4243 2016-02-08 84.00 2016-02-06
8 AB-4243 2016-02-09 81.00 2016-02-06
9 AB-4243 2016-02-10 79.00 2016-02-06
10 NP-7585 2016-02-01 22.00 2016-02-01
11 NP-7585 2016-02-02 23.50 2016-02-01
12 NP-7585 2016-02-03 30.15 2016-02-01
13 NP-7585 2016-02-04 30.15 2016-02-01
14 NP-7585 2016-02-05 30.15 2016-02-01
15 NP-7585 2016-02-06 30.15 2016-02-01
16 NP-7585 2016-02-07 0.00 2016-02-01
17 NP-7585 2016-02-08 0.00 2016-02-01
18 NP-7585 2016-02-09 22.50 2016-02-01
19 NP-7585 2016-02-10 45.67 2016-02-01
29 VX-4376 2016-02-10 33.13 2016-02-10
答案 1 :(得分:0)
我非常确定会有更全面的方式来缩短代码。但如果没有更多答案,你可以使用它:
casenum A B C D
1 10 20 0 0
2 0 1 2 12
3 10 20 1 2
. . . . .
. . . . .
结果将是:
import pandas as pd
import datetime
#Read data
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')
#Format Datetime column
df1['datetime'] = df1['datetime'].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y'))
df2['datetime'] = df2['datetime'].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y'))
#Get min datetime in df2 for each id
min(list(df2[df2['userid']=='AB-4243']['datetime']))
temp = pd.DataFrame(list(set(df2['userid'])))
temp.columns = ['userid']
temp['min_datetime'] = temp['userid'].apply(lambda x: min(list(df2[df2['userid']==x]['datetime'])))
temp
#Merge in
df1 = df1.merge(temp, on='userid')
#Slicing
result = df1[df1['datetime'] >= df1['min_datetime']]