Question

对于这个问题，我想采用类似的方法Select DataFrame rows between two dates 但有时间范围。

我有一个有关餐厅订单的数据集，其中包含时间和订单类型。早餐，午餐和晚餐都有时间间隔。

时间间隔：

早餐：（8:00:00-12:00:00）午餐：（12：00：01-16：00：00）晚餐：（16：00：01-20：00：00）

数据集示例：

order_type  time
0   Lunch   13:24:30
1   Dinner  18:28:43
2   Dinner  17:17:44
3   Lunch   15:46:28
4   Lunch   12:33:48
5   Lunch   15:26:11
6   Lunch   13:04:13
7   Lunch   12:13:31
8   Breakfast   08:20:16
9   Breakfast   08:10:08
10  Dinner  18:08:27
11  Breakfast   10:42:15
12  Dinner  19:09:17
13  Dinner  18:28:43
14  Breakfast   09:21:07

我的time列最初是object类型，我将其转换为timedelta64[ns]。

我想创建三个时间范围，每个order_type一个。然后使用它们来验证我的数据集的准确性。

当我拥有这三个范围时，我可以运行类似下面的for loop：

for order in dirtyData['order_type']:
    for time in dirtyData['time']:
        if order=='Breakfast' and time not in BreakfastRange:
            *do something*

我提到了documentation和这个post。应用between_time，但我一直遇到错误。

Answer 1

您可以使用pd.cut：

# threshold for time range
bins = pd.to_timedelta(['8:00:00', '12:00:00', '16:00:00', '20:00:00'])

# cut:
df['order_type_gt'] = pd.cut(df['time'],
                             bins, 
                             labels=['Breakfast','Lunch', 'Dinner'], 
                             include_lowest=True)

输出：

   order_type     time order_type_gt
0       Lunch 13:24:30         Lunch
1      Dinner 18:28:43        Dinner
2      Dinner 17:17:44        Dinner
3       Lunch 15:46:28         Lunch
4       Lunch 12:33:48         Lunch
5       Lunch 15:26:11         Lunch
6       Lunch 13:04:13         Lunch
7       Lunch 12:13:31         Lunch
8   Breakfast 08:20:16     Breakfast

Answer 2

我们可以使用pd.cut，然后只需将输出与原始order_type匹配

pd.cut(df.time,pd.to_timedelta(['00:00:00','12:00:00','16:00:00','23:59:59']),labels=['B','L','D'])

在熊猫中创建时间范围

2 个答案: