调用pandas.DataFrame.groupby().shift()
时,列似乎会按列索引重新排序。 sort参数仅适用于行。
以下是一个例子:
import pandas as pd
df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
'E': ['a','b','c','d','e','f'],
'B': [10, 12, 10, 25, 10, 12],
'C': [100, 102, 100, 250, 100, 102],
'D': [1,2,3,4,5,6]
})
df.set_index('A',inplace=True)
df = df[['E','C','D','B']]
df
# E C D B
# A
#group1 a 100 1 10
#group1 b 102 2 12
#group2 c 100 3 10
#group2 d 250 4 25
#group3 e 100 5 10
#group3 f 102 6 12
从这里开始,我想实现:
# E C D B C_s D_s B_s
# A
#group1 a 100 1 10 102.0 2.0 12.0
#group1 b 102 2 12 NaN NaN NaN
#group2 c 100 3 10 250.0 4.0 25.0
#group2 d 250 4 25 NaN NaN NaN
#group3 e 100 5 10 102.0 6.0 12.0
#group3 f 102 6 12 NaN NaN NaN
但是
df[['C_s','D_s','B_s']]= df.groupby(level='A')[['C','D','B']].shift(-1)
结果:
# E C D B C_s D_s B_s
# A
#group1 a 100 1 10 12.0 102.0 2.0
#group1 b 102 2 12 NaN NaN NaN
#group2 c 100 3 10 25.0 250.0 4.0
#group2 d 250 4 25 NaN NaN NaN
#group3 e 100 5 10 12.0 102.0 6.0
#group3 f 102 6 12 NaN NaN NaN
引入列的人工排序有助于维护列的内在逻辑连接:
df = df.sort_index(axis=1)
df[['B_s','C_s','D_s']]= df.groupby(level='A')[['B','C','D']].shift(-1).sort_index(axis=1)
df
# B C D E B_s C_s D_s
# A
#group1 10 100 1 a 12.0 102.0 2.0
#group1 12 102 2 b NaN NaN NaN
#group2 10 100 3 c 25.0 250.0 4.0
#group2 25 250 4 d NaN NaN NaN
#group3 10 100 5 e 12.0 102.0 6.0
#group3 12 102 6 f NaN NaN NaN
为什么列首先重新排序?
答案 0 :(得分:3)
在我看来这是错误。
使用自定义lambda函数:
df[['C_s','D_s','B_s']] = (df.groupby(level='A')['C','D','B']
.apply(pd.DataFrame.shift, periods=-1))
感谢@cᴏʟᴅsᴘᴇᴇᴅ寻求另一种解决方案:
time_original time_seconds time_round time_below time_above
273.0 21.782 22.0 0.0 52.0
273.0 21.816 22.0 0.0 52.0
273.0 21.849 22.0 0.0 52.0
273.0 21.882 22.0 0.0 52.0
273.0 104.143 104.0 74.0 134.0
273.0 104.176 104.0 74.0 134.0
273.0 104.210 104.0 74.0 134.0