我有一个日期时间索引如下的数据框
137962500 137975000 137987500 138000000 138012500 138025000 138037500 138050000 138062500
Datetime
2020-02-05 11:06:00+00:00 -112.0 -114.0 -114.0 -114.0 -114.0 -116.0 -114.0 -114.0 -114.0
2020-02-05 11:07:00+00:00 -112.0 -111.0 -112.0 -112.0 -112.0 -112.0 -113.0 -113.0 -112.0
2020-02-05 11:08:00+00:00 -113.0 -112.0 -112.0 -112.0 -112.0 -112.0 -112.0 -112.0 -112.0
2020-02-05 11:09:00+00:00 -111.0 -112.0 -111.0 -112.0 -112.0 -112.0 -112.0 -112.0 -112.0
2020-02-05 11:10:00+00:00 -111.0 -112.0 -111.0 -112.0 -113.0 -113.0 -112.0 -112.0 -112.0
我知道df.max(axis = 0)来获取每一行的最大值。我们如何扩展这个概念以将数据帧分为n列,并在每个组中获得最大值? 这样可以减少“宽”数据格式的列数,同时又能在每个较小的列组中保留最大值。
谢谢
答案 0 :(得分:1)
假设您要将列分配为4组:
group_size = 4
groups = np.arange(df.columns.shape[0]) // group_size
labels = df.columns.to_series().groupby(groups).transform(lambda g: f'{g.min()} - {g.max()}')
df.groupby(labels, axis=1).max()
结果:
137962500 - 138000000 138012500 - 138050000 138062500 - 138062500
Datetime
2020-02-05 11:06:00+00:00 -112.0 -114.0 -114.0
2020-02-05 11:07:00+00:00 -111.0 -112.0 -112.0
2020-02-05 11:08:00+00:00 -112.0 -112.0 -112.0
2020-02-05 11:09:00+00:00 -111.0 -112.0 -112.0
2020-02-05 11:10:00+00:00 -111.0 -112.0 -112.0