熊猫按不规则日期范围分组

时间:2020-09-25 02:33:18

标签: python pandas

我有一个具有多个日期时间的DataFrame:

from faker import Faker
from datetime import datetime
import pandas as pd

fake = Faker()

n = 100
start_date = datetime(2020, 1, 1, 0, 6, 0)
end_date = datetime(2020, 2, 1, 0, 17, 0)

df = pd.DataFrame({"datetime": [fake.date_time_between(start_date=start_date, end_date=end_date) for _ in range(n)], "count": 1}).sort_values(by='datetime')
              datetime  count
61 2020-01-01 12:56:39      1
0  2020-01-01 15:10:35      1
22 2020-01-02 09:37:50      1
41 2020-01-02 15:46:58      1
44 2020-01-03 06:49:39      1
..                 ...    ...
89 2020-01-29 09:51:02      1
98 2020-01-29 22:43:13      1
39 2020-01-30 01:40:48      1
79 2020-01-31 14:07:28      1
43 2020-01-31 20:24:43      1

我需要按datetime分组,截止时间是每天早上6点和下午5点(因此,第一组将在[2019-12-31T17:00:00, 2020-01-01T06:00:00)中,第二组将在[2020-01-01T06:00:00, 2020-01-01T17:00:00)中,依此类推。

我已经研究过使用datetimepd.Grouper进行分组,但是我无法弄清楚如何对不同长度的间隔使用freq(在这种情况下, 13h11h,而不是12h的两个间隔)。

1 个答案:

答案 0 :(得分:1)

IIUC,您可以先resample1H,通过检查小时数来创建组号,然后再通过groupby进行创建:

df = df.resample("1H", on="datetime").sum()

df["group"] = df.index.hour.isin([6,17]).cumsum()

print (df.reset_index().groupby("group").agg({"datetime":"first", "count":"sum"}))

                 datetime  count
group                           
0     2020-01-01 03:00:00      1
1     2020-01-01 06:00:00      3
2     2020-01-01 17:00:00      0
3     2020-01-02 06:00:00      1
4     2020-01-02 17:00:00      2
5     2020-01-03 06:00:00      1
6     2020-01-03 17:00:00      4
7     2020-01-04 06:00:00      2
8     2020-01-04 17:00:00      5
9     2020-01-05 06:00:00      2
10    2020-01-05 17:00:00      2
11    2020-01-06 06:00:00      0
12    2020-01-06 17:00:00      1
13    2020-01-07 06:00:00      2
14    2020-01-07 17:00:00      1
15    2020-01-08 06:00:00      0
16    2020-01-08 17:00:00      2
17    2020-01-09 06:00:00      2
18    2020-01-09 17:00:00      1
19    2020-01-10 06:00:00      2
20    2020-01-10 17:00:00      3
21    2020-01-11 06:00:00      0
22    2020-01-11 17:00:00      0
23    2020-01-12 06:00:00      2
24    2020-01-12 17:00:00      1
25    2020-01-13 06:00:00      1
26    2020-01-13 17:00:00      0
27    2020-01-14 06:00:00      3
28    2020-01-14 17:00:00      3
29    2020-01-15 06:00:00      1
30    2020-01-15 17:00:00      3
31    2020-01-16 06:00:00      0
32    2020-01-16 17:00:00      0
33    2020-01-17 06:00:00      1
34    2020-01-17 17:00:00      2
35    2020-01-18 06:00:00      2
36    2020-01-18 17:00:00      3
37    2020-01-19 06:00:00      2
38    2020-01-19 17:00:00      1
39    2020-01-20 06:00:00      3
40    2020-01-20 17:00:00      2
41    2020-01-21 06:00:00      2
42    2020-01-21 17:00:00      2
43    2020-01-22 06:00:00      3
44    2020-01-22 17:00:00      2
45    2020-01-23 06:00:00      1
46    2020-01-23 17:00:00      2
47    2020-01-24 06:00:00      1
48    2020-01-24 17:00:00      0
49    2020-01-25 06:00:00      0
50    2020-01-25 17:00:00      3
51    2020-01-26 06:00:00      2
52    2020-01-26 17:00:00      0
53    2020-01-27 06:00:00      3
54    2020-01-27 17:00:00      1
55    2020-01-28 06:00:00      0
56    2020-01-28 17:00:00      1
57    2020-01-29 06:00:00      1
58    2020-01-29 17:00:00      0
59    2020-01-30 06:00:00      1
60    2020-01-30 17:00:00      3
61    2020-01-31 06:00:00      0
62    2020-01-31 17:00:00      1
63    2020-02-01 06:00:00      4