沿着n行的最大熊猫数

时间:2020-02-17 15:24:00

标签: pandas

我有一个日期时间索引如下的数据框

137962500   137975000   137987500   138000000   138012500   138025000   138037500   138050000   138062500
Datetime                                    
2020-02-05 11:06:00+00:00   -112.0  -114.0  -114.0  -114.0  -114.0  -116.0  -114.0  -114.0  -114.0
2020-02-05 11:07:00+00:00   -112.0  -111.0  -112.0  -112.0  -112.0  -112.0  -113.0  -113.0  -112.0
2020-02-05 11:08:00+00:00   -113.0  -112.0  -112.0  -112.0  -112.0  -112.0  -112.0  -112.0  -112.0
2020-02-05 11:09:00+00:00   -111.0  -112.0  -111.0  -112.0  -112.0  -112.0  -112.0  -112.0  -112.0
2020-02-05 11:10:00+00:00   -111.0  -112.0  -111.0  -112.0  -113.0  -113.0  -112.0  -112.0  -112.0

我知道df.max(axis = 0)来获取每一行的最大值。我们如何扩展这个概念以将数据帧分为n列,并在每个组中获得最大值? 这样可以减少“宽”数据格式的列数,同时又能在每个较小的列组中保留最大值。

谢谢

1 个答案:

答案 0 :(得分:1)

假设您要将列分配为4组:

group_size = 4
groups = np.arange(df.columns.shape[0]) // group_size
labels = df.columns.to_series().groupby(groups).transform(lambda g: f'{g.min()} - {g.max()}')

df.groupby(labels, axis=1).max()

结果:

                           137962500 - 138000000  138012500 - 138050000  138062500 - 138062500
Datetime                                                                                      
2020-02-05 11:06:00+00:00                 -112.0                 -114.0                 -114.0
2020-02-05 11:07:00+00:00                 -111.0                 -112.0                 -112.0
2020-02-05 11:08:00+00:00                 -112.0                 -112.0                 -112.0
2020-02-05 11:09:00+00:00                 -111.0                 -112.0                 -112.0
2020-02-05 11:10:00+00:00                 -111.0                 -112.0                 -112.0