我有一个DataFrame
customer_number purchase_time quantity
14 2007-03-01 07:06:00 10
20 2007-03-12 13:05:00 13
我试着找到上午和下午买的总量。我将purchase_time
转换为日期时间
df['purchase_time'] = pd.to_datetime(df['purchase_time'])
# Baskets bought in morning.
df[df['purchase_time'] < '12:00:00']
但是,结果是原始数据集。
答案 0 :(得分:7)
你可以
df[df['purchase_time'].dt.time < pd.to_datetime('12:00:00').time()]
Out[152]:
customer_number purchase_time quantity
0 14 2007-03-01 07:06:00 10
答案 1 :(得分:6)
您可能不需要转换,只需按字典顺序比较 -
df[df['purchase_time'].str.split().str[1] < '12:00:00']
customer_number purchase_time quantity
0 14 2007-03-01 07:06:00 10
虽然为了额外的安全层,我建议转换为timedelta
并进行比较 - 这些比较仍然适用于字符串(大熊猫就是这样的奇迹) -
df[pd.to_timedelta(
df['purchase_time'].str.split().str[1], errors='coerce'
) < '12:00:00']
customer_number purchase_time quantity
0 14 2007-03-01 07:06:00 10
答案 2 :(得分:6)
假设purchase_time
属于datetime
dtype:
In [88]: df.query("purchase_time.dt.hour < 12 and purchase_time.dt.month in [3,6]")
Out[88]:
customer_number purchase_time quantity
0 14 2007-03-01 07:06:00 10
答案 3 :(得分:4)
在groupby中使用布尔数组
df.groupby(df.purchase_time.dt.hour < 12).sum().rename(
{True: 'Morning', False: 'Afternoon'})
customer_number quantity
purchase_time
Afternoon 20 13
Morning 14 10