创建每小时间隔并遍历以获得value_counts python pandas

时间:2019-10-14 08:27:19

标签: python pandas

我有一个日期时间索引为2019-04-25 15:00:00到2019-04-26 15:00:00的数据框

每个小时我想找到df [“ mode”]。value_counts()来查看每小时有多少个模式计数。

如此,between_time(“ 08:00”,“ 08:02”),between_time(“ 09:00”,“ 09:02”),between_time(“ 10:00”,“ 10:02”)和等等...

我的数据框看起来像:

                        serial_number     mode
gps_updated_at

2019-04-26 15:01:00       A               standby
2019-04-26 15:02:00       A               standby
2019-04-26 15:02:00       B               standby
2019-04-26 15:02:00       B               good
2019-04-26 16:00:00       B               good
2019-04-26 16:01:00       C               bad

所以我想每小时获得(15:00:00)

standby  3
good     1

和16:00:00

good      1
bad       1

如何使循环效率提高一个小时。

2 个答案:

答案 0 :(得分:2)

DatetimeIndex.hourSeriesGroupBy.value_counts一起使用:

s = df.groupby(df.index.hour)['mode'].value_counts()
print (s)
gps_updated_at  mode   
15              standby    3
                good       1
16              bad        1
                good       1
Name: mode, dtype: int64

print (s[15])
mode
standby    3
good       1
Name: mode, dtype: int64

print (s[16])
mode
bad     1
good    1
Name: mode, dtype: int64

df1 = df.groupby(df.index.hour)['mode'].value_counts().reset_index(name='count')
print (df1)
   gps_updated_at     mode  count
0              15  standby      3
1              15     good      1
2              16      bad      1
3              16     good      1

或通过DatetimeIndex.floorDatetimeIndex.strftime将分钟和秒转换为0

s = df.groupby(df.index.floor('H').strftime('%H:%M:%S'))['mode'].value_counts()
print (s)
          mode   
15:00:00  standby    3
          good       1
16:00:00  bad        1
          good       1
Name: mode, dtype: int64

print (s['15:00:00'])
mode
standby    3
good       1
Name: mode, dtype: int64

print (s['16:00:00'])
mode
bad     1
good    1
Name: mode, dtype: int64

df2 = df.groupby(df.index.floor('H').strftime('%H:%M:%S').rename('hour'))['mode'].value_counts().reset_index(name='count')
print (df2)
       hour     mode  count
0  15:00:00  standby      3
1  15:00:00     good      1
2  16:00:00      bad      1
3  16:00:00     good      1

答案 1 :(得分:2)

GroupBypd.Grouper一起使用是我使用Grouper的原因,是因为它可以使date保持完整。

df.groupby(pd.Grouper(key='gps_updated_at', freq='H'))['mode'].value_counts()

输出

gps_updated_at       mode   
2019-04-26 15:00:00  standby    3
                     good       1
2019-04-26 16:00:00  bad        1
                     good       1
Name: mode, dtype: int64

如果要返回数据帧,请使用reset_index

df.groupby(pd.Grouper(key='gps_updated_at', freq='H'))['mode'].value_counts().reset_index(name='count')

输出

       gps_updated_at     mode  count
0 2019-04-26 15:00:00  standby      3
1 2019-04-26 15:00:00     good      1
2 2019-04-26 16:00:00      bad      1
3 2019-04-26 16:00:00     good      1