我有一个具有多个日期时间的DataFrame:
from faker import Faker
from datetime import datetime
import pandas as pd
fake = Faker()
n = 100
start_date = datetime(2020, 1, 1, 0, 6, 0)
end_date = datetime(2020, 2, 1, 0, 17, 0)
df = pd.DataFrame({"datetime": [fake.date_time_between(start_date=start_date, end_date=end_date) for _ in range(n)], "count": 1}).sort_values(by='datetime')
datetime count
61 2020-01-01 12:56:39 1
0 2020-01-01 15:10:35 1
22 2020-01-02 09:37:50 1
41 2020-01-02 15:46:58 1
44 2020-01-03 06:49:39 1
.. ... ...
89 2020-01-29 09:51:02 1
98 2020-01-29 22:43:13 1
39 2020-01-30 01:40:48 1
79 2020-01-31 14:07:28 1
43 2020-01-31 20:24:43 1
我需要按datetime
分组,截止时间是每天早上6点和下午5点(因此,第一组将在[2019-12-31T17:00:00, 2020-01-01T06:00:00)
中,第二组将在[2020-01-01T06:00:00, 2020-01-01T17:00:00)
中,依此类推。
我已经研究过使用datetime
按pd.Grouper
进行分组,但是我无法弄清楚如何对不同长度的间隔使用freq
(在这种情况下, 13h
和11h
,而不是12h
的两个间隔)。
答案 0 :(得分:1)
IIUC,您可以先resample
至1H
,通过检查小时数来创建组号,然后再通过groupby
进行创建:
df = df.resample("1H", on="datetime").sum()
df["group"] = df.index.hour.isin([6,17]).cumsum()
print (df.reset_index().groupby("group").agg({"datetime":"first", "count":"sum"}))
datetime count
group
0 2020-01-01 03:00:00 1
1 2020-01-01 06:00:00 3
2 2020-01-01 17:00:00 0
3 2020-01-02 06:00:00 1
4 2020-01-02 17:00:00 2
5 2020-01-03 06:00:00 1
6 2020-01-03 17:00:00 4
7 2020-01-04 06:00:00 2
8 2020-01-04 17:00:00 5
9 2020-01-05 06:00:00 2
10 2020-01-05 17:00:00 2
11 2020-01-06 06:00:00 0
12 2020-01-06 17:00:00 1
13 2020-01-07 06:00:00 2
14 2020-01-07 17:00:00 1
15 2020-01-08 06:00:00 0
16 2020-01-08 17:00:00 2
17 2020-01-09 06:00:00 2
18 2020-01-09 17:00:00 1
19 2020-01-10 06:00:00 2
20 2020-01-10 17:00:00 3
21 2020-01-11 06:00:00 0
22 2020-01-11 17:00:00 0
23 2020-01-12 06:00:00 2
24 2020-01-12 17:00:00 1
25 2020-01-13 06:00:00 1
26 2020-01-13 17:00:00 0
27 2020-01-14 06:00:00 3
28 2020-01-14 17:00:00 3
29 2020-01-15 06:00:00 1
30 2020-01-15 17:00:00 3
31 2020-01-16 06:00:00 0
32 2020-01-16 17:00:00 0
33 2020-01-17 06:00:00 1
34 2020-01-17 17:00:00 2
35 2020-01-18 06:00:00 2
36 2020-01-18 17:00:00 3
37 2020-01-19 06:00:00 2
38 2020-01-19 17:00:00 1
39 2020-01-20 06:00:00 3
40 2020-01-20 17:00:00 2
41 2020-01-21 06:00:00 2
42 2020-01-21 17:00:00 2
43 2020-01-22 06:00:00 3
44 2020-01-22 17:00:00 2
45 2020-01-23 06:00:00 1
46 2020-01-23 17:00:00 2
47 2020-01-24 06:00:00 1
48 2020-01-24 17:00:00 0
49 2020-01-25 06:00:00 0
50 2020-01-25 17:00:00 3
51 2020-01-26 06:00:00 2
52 2020-01-26 17:00:00 0
53 2020-01-27 06:00:00 3
54 2020-01-27 17:00:00 1
55 2020-01-28 06:00:00 0
56 2020-01-28 17:00:00 1
57 2020-01-29 06:00:00 1
58 2020-01-29 17:00:00 0
59 2020-01-30 06:00:00 1
60 2020-01-30 17:00:00 3
61 2020-01-31 06:00:00 0
62 2020-01-31 17:00:00 1
63 2020-02-01 06:00:00 4