我的数据框和日期看起来像
event_time
2017-01-17 00:12:50
2016-12-05 01:00:21
2016-12-04 01:14:36
2016-12-04 01:04:03
2016-12-04 02:28:23
2016-12-04 02:46:49
2016-12-04 01:58:04
我需要获取列period
,其中15分钟从00:00:00
开始,而日,月和年并不重要。
时间00:00:00 - 00:15:00
- 期间1
00:15:01 - 00:50:00
- 期间2等
如果我使用df = df.groupby(pd.TimeGrouper(freq='15Min'))
它会出错,因为它也会使用它。但我只需要时间。
欲望输出
event_time period
2017-01-17 00:12:50 1
2016-12-05 01:00:21 4
2016-12-04 01:14:36 4
2016-12-04 01:04:03 4
2016-12-04 02:28:23 10
2016-12-04 02:46:49 12
2016-12-04 01:58:04 8
我该怎么做?
答案 0 :(得分:1)
df = pd.DataFrame(pd.to_datetime([
"2017-01-17 00:12:50",
"2016-12-05 01:00:21",
"2016-12-04 01:14:36",
"2016-12-04 01:04:03",
"2016-12-04 02:28:23",
"2016-12-04 02:46:49",
"2016-12-04 01:58:04"]),
columns=['timestamp']
)
然后period
列
df['period'] = df.timestamp.apply(lambda ts: 1 + ts.hour * 4 + ts.minute // 15)
给出以下输入:
timestamp period
0 2017-01-17 00:12:50 1
1 2016-12-05 01:00:21 5
2 2016-12-04 01:14:36 5
3 2016-12-04 01:04:03 5
4 2016-12-04 02:28:23 10
5 2016-12-04 02:46:49 12
6 2016-12-04 01:58:04 8
您和我的输出之间存在小的差异 - 行1
,2
和3
:例如01:00:21
应为5
,因为有四个第一个小时和第五个小时才开始。
答案 1 :(得分:1)
> s[/A/i]
=> "a"
> s[/A/]
=> nil
<强>计时强>:
df['label'] = df['event_time'].dt.hour * 4 + df['event_time'].dt.minute // 15 + 1
print (df)
event_time label
0 2017-01-17 00:12:50 1
1 2016-12-05 01:00:21 5
2 2016-12-04 01:14:36 5
3 2016-12-04 01:04:03 5
4 2016-12-04 02:28:23 10
5 2016-12-04 02:46:49 12
6 2016-12-04 01:58:04 8
旧解决方案(工作,但有点复杂):
您可以先rng = pd.date_range('2017-04-03', periods=100000, freq='27T')
df = pd.DataFrame({'timestamp': rng})
df['label'] = df['timestamp'].dt.hour * 4 + df['timestamp'].dt.minute // 15 + 1
df['period'] = df.timestamp.apply(lambda ts: 1 + ts.hour * 4 + ts.minute // 15)
print (df)
In [172]: %timeit df['timestamp'].dt.hour * 4 + df['timestamp'].dt.minute // 15 + 1
10 loops, best of 3: 20.2 ms per loop
In [173]: %timeit df.timestamp.apply(lambda ts: 1 + ts.hour * 4 + ts.minute // 15)
1 loop, best of 3: 301 ms per loop
将datetimes
转换为to_timedelta
,然后按strftime
转换为秒。
然后使用total_seconds
或cut
:
df['tot'] = pd.to_timedelta(df['event_time'].dt.strftime('%H:%M:%S'))
.dt.total_seconds()
.astype(int)
#necessary add one group
bins = np.concatenate([np.arange(24 * 4) * 900, np.array([100000])])
labels = np.arange(1, 24 * 4 + 1)
df['label'] = pd.cut(df['tot'], bins=bins, labels=labels)
df = df.assign(label1=np.searchsorted(bins, df['tot']))
print (df)
event_time tot label label1
0 2017-01-17 00:12:50 770 1 1
1 2016-12-05 01:00:21 3621 5 5
2 2016-12-04 01:14:36 4476 5 5
3 2016-12-04 01:04:03 3843 5 5
4 2016-12-04 02:28:23 8903 10 10
5 2016-12-04 02:46:49 10009 12 12
6 2016-12-04 01:58:04 7084 8 8
类似的解决方案,仅适用于Series
tot:
tot = pd.to_timedelta(df['event_time'].dt.strftime('%H:%M:%S'))
.dt.total_seconds()
.astype(int)
bins = np.concatenate([np.arange(24 * 4) * 900, np.array([100000])])
labels = np.arange(1, 24 * 4 + 1)
df['label'] = pd.cut(tot, bins=bins, labels=labels)
df = df.assign(label1=np.searchsorted(bins, tot))
print (df)
event_time label label1
0 2017-01-17 00:12:50 1 1
1 2016-12-05 01:00:21 5 5
2 2016-12-04 01:14:36 5 5
3 2016-12-04 01:04:03 5 5
4 2016-12-04 02:28:23 10 10
5 2016-12-04 02:46:49 12 12
6 2016-12-04 01:58:04 8 8