我有一个日期时间索引为2019-04-25 15:00:00到2019-04-26 15:00:00的数据框
每个小时我想找到df [“ mode”]。value_counts()来查看每小时有多少个模式计数。
如此,between_time(“ 08:00”,“ 08:02”),between_time(“ 09:00”,“ 09:02”),between_time(“ 10:00”,“ 10:02”)和等等...
我的数据框看起来像:
serial_number mode
gps_updated_at
2019-04-26 15:01:00 A standby
2019-04-26 15:02:00 A standby
2019-04-26 15:02:00 B standby
2019-04-26 15:02:00 B good
2019-04-26 16:00:00 B good
2019-04-26 16:01:00 C bad
所以我想每小时获得(15:00:00)
standby 3
good 1
和16:00:00
good 1
bad 1
如何使循环效率提高一个小时。
答案 0 :(得分:2)
将DatetimeIndex.hour
与SeriesGroupBy.value_counts
一起使用:
s = df.groupby(df.index.hour)['mode'].value_counts()
print (s)
gps_updated_at mode
15 standby 3
good 1
16 bad 1
good 1
Name: mode, dtype: int64
print (s[15])
mode
standby 3
good 1
Name: mode, dtype: int64
print (s[16])
mode
bad 1
good 1
Name: mode, dtype: int64
df1 = df.groupby(df.index.hour)['mode'].value_counts().reset_index(name='count')
print (df1)
gps_updated_at mode count
0 15 standby 3
1 15 good 1
2 16 bad 1
3 16 good 1
或通过DatetimeIndex.floor
用DatetimeIndex.strftime
将分钟和秒转换为0
:
s = df.groupby(df.index.floor('H').strftime('%H:%M:%S'))['mode'].value_counts()
print (s)
mode
15:00:00 standby 3
good 1
16:00:00 bad 1
good 1
Name: mode, dtype: int64
print (s['15:00:00'])
mode
standby 3
good 1
Name: mode, dtype: int64
print (s['16:00:00'])
mode
bad 1
good 1
Name: mode, dtype: int64
df2 = df.groupby(df.index.floor('H').strftime('%H:%M:%S').rename('hour'))['mode'].value_counts().reset_index(name='count')
print (df2)
hour mode count
0 15:00:00 standby 3
1 15:00:00 good 1
2 16:00:00 bad 1
3 16:00:00 good 1
答案 1 :(得分:2)
将GroupBy
与pd.Grouper
一起使用是我使用Grouper
的原因,是因为它可以使date
保持完整。
df.groupby(pd.Grouper(key='gps_updated_at', freq='H'))['mode'].value_counts()
输出
gps_updated_at mode
2019-04-26 15:00:00 standby 3
good 1
2019-04-26 16:00:00 bad 1
good 1
Name: mode, dtype: int64
如果要返回数据帧,请使用reset_index
:
df.groupby(pd.Grouper(key='gps_updated_at', freq='H'))['mode'].value_counts().reset_index(name='count')
输出
gps_updated_at mode count
0 2019-04-26 15:00:00 standby 3
1 2019-04-26 15:00:00 good 1
2 2019-04-26 16:00:00 bad 1
3 2019-04-26 16:00:00 good 1