我正在尝试将此功能应用于pandas数据框,以便查看出租车接送或下降时间是否在我使用arrimin创建的范围内,到达下面的最大变量。
如果时间确实属于范围,我想保留行。如果它超出范围我想从数据帧中删除它。
Start.Time,End.Time等都是日期时间对象,因此时间功能应该可以正常工作。
def time_function(df, row):
gametimestart = df['Start.Time']
gametimeend = df['End.Time']
arrivemin = gametimestart - datetime.timedelta(minutes=120)
arrivemax = gametimeend - datetime.timedelta(minutes = 30)
departmin = gametimeend - datetime.timedelta(minutes = 60)
departmax = gametimeend + datetime.timedelta(minutes = 90)
for not i in ((df['pickup_datetime'] > arrivemin) & (df['pickupdatetime'] < arrivemax) &(df['dropoff_datetime'] > departmin) & (df['dropoffdatetime'] < departmax)):
df = df.drop[df[i.index]]
return
for index, row in yankdf:
time_function(yankdf, row)
继续收到此语法错误:
File "<ipython-input-25-bda6fb2db429>", line 17
for not i in (((row['pickup_datetime'] > arrivemin) & (row['pickupdatetime'] < arrivemax)) | ((row['dropoff_datetime'] > departmin) & (row['dropoffdatetime'] < departmax)):
^
SyntaxError: invalid syntax
答案 0 :(得分:1)
我认为你不需要这个功能。只需执行一个基本子集,df_filtered就应该是过滤后的数据帧。
gametimestart = df['Start.Time']
gametimeend = df['End.Time']
arrivemin = gametimestart - datetime.timedelta(minutes=120)
arrivemax = gametimeend - datetime.timedelta(minutes = 30)
departmin = gametimeend - datetime.timedelta(minutes = 60)
departmax = gametimeend + datetime.timedelta(minutes = 90)
df_filtered = df[(df['pickup_datetime'] > arrivemin) &
(df['pickup_datetime'] < arrivemax) &
(df['dropoff_datetime'] > departmin) &
(df['dropoffdatetime'] < departmax)]