从日期时间中提取时间以便在熊猫中进行比较

时间:2018-03-17 22:20:44

标签: python pandas datetime dataframe

我有一个DataFrame

customer_number   purchase_time         quantity
14                2007-03-01 07:06:00   10
20                2007-03-12 13:05:00   13

我试着找到上午和下午买的总量。我将purchase_time转换为日期时间

df['purchase_time'] = pd.to_datetime(df['purchase_time'])
# Baskets bought in morning.
df[df['purchase_time'] < '12:00:00']

但是,结果是原始数据集。

4 个答案:

答案 0 :(得分:7)

你可以

df[df['purchase_time'].dt.time < pd.to_datetime('12:00:00').time()]
Out[152]: 
   customer_number       purchase_time  quantity
0               14 2007-03-01 07:06:00        10

答案 1 :(得分:6)

您可能不需要转换,只需按字典顺序比较 -

df[df['purchase_time'].str.split().str[1] < '12:00:00']

   customer_number        purchase_time  quantity
0               14  2007-03-01 07:06:00        10

虽然为了额外的安全层,我建议转换为timedelta并进行比较 - 这些比较仍然适用于字符串(大熊猫就是这样的奇迹) -

df[pd.to_timedelta(
       df['purchase_time'].str.split().str[1], errors='coerce'
) < '12:00:00']

   customer_number        purchase_time  quantity
0               14  2007-03-01 07:06:00        10

答案 2 :(得分:6)

假设purchase_time属于datetime dtype:

In [88]: df.query("purchase_time.dt.hour < 12 and purchase_time.dt.month in [3,6]")
Out[88]:
   customer_number       purchase_time  quantity
0               14 2007-03-01 07:06:00        10

答案 3 :(得分:4)

在groupby中使用布尔数组

df.groupby(df.purchase_time.dt.hour < 12).sum().rename(
    {True: 'Morning', False: 'Afternoon'})

               customer_number  quantity
purchase_time                           
Afternoon                   20        13
Morning                     14        10