考虑系列:
np.random.seed([3,1415])
s = pd.Series(np.random.rand(100),
pd.MultiIndex.from_product([list('ABDCE'),
list('abcde'),
['One', 'Two', 'Three', 'Four']]))
我可以groupby
索引级别的组合并获得idxmax
:
s.groupby(level=[0, 2]).idxmax()
A Four (A, c, Four)
One (A, d, One)
Three (A, c, Three)
Two (A, d, Two)
B Four (B, d, Four)
One (B, d, One)
Three (B, c, Three)
Two (B, b, Two)
C Four (C, b, Four)
One (C, a, One)
Three (C, a, Three)
Two (C, e, Two)
D Four (D, b, Four)
One (D, e, One)
Three (D, b, Three)
Two (D, c, Two)
E Four (E, c, Four)
One (E, a, One)
Three (E, c, Three)
Two (E, a, Two)
dtype: object
我希望每个中每个的数字位置。
我可以通过awesome answers to this question
获取数字位置s.groupby(level=[0, 2]).idxmax().apply(lambda x: s.index.get_loc(x))
A Four 11
One 12
Three 10
Two 13
B Four 35
One 32
Three 30
Two 25
C Four 67
One 60
Three 62
Two 77
D Four 47
One 56
Three 46
Two 49
E Four 91
One 80
Three 90
Two 81
dtype: int64
但我想要这个:
A Four 2
One 3
Three 2
Two 3
B Four 3
One 3
Three 2
Two 1
C Four 1
One 0
Three 0
Two 4
D Four 1
One 4
Three 1
Two 2
E Four 2
One 0
Three 2
Two 0
dtype: int64
答案 0 :(得分:6)
好吧,我终于有了一个解决方案,它使用NumPy的重塑方法,然后沿着其中一个轴操作,给我们argmax
。我不确定这是否优雅,但我希望在性能方面会很好。另外,我假设用于多索引数据的pandas Series具有常规格式,即每个级别保持所有索引的元素数量。
这是实施 -
L0,L1,L2 = s.index.levels[:3]
IDs = s.sortlevel().values.reshape(-1,len(L0),len(L1),len(L2)).argmax(2)
sOut = pd.Series(IDs.ravel(),pd.MultiIndex.from_product([L0,L2]))
时间(pir的补充)
答案 1 :(得分:4)
我就这样做了:
s.groupby(level=[0, 2]).apply(lambda x: x.index.get_loc(x.idxmax()))
A Four 2
One 3
Three 2
Two 3
B Four 3
One 3
Three 2
Two 1
C Four 1
One 0
Three 0
Two 4
D Four 1
One 4
Three 1
Two 2
E Four 2
One 0
Three 2
Two 0
dtype: int64