返回表示每个组内最大值的索引的一系列数字位置

时间:2016-07-18 17:05:19

标签: python pandas

考虑系列:

np.random.seed([3,1415])
s = pd.Series(np.random.rand(100),
              pd.MultiIndex.from_product([list('ABDCE'),
                                          list('abcde'),
                                          ['One', 'Two', 'Three', 'Four']]))

我可以groupby索引级别的组合并获得idxmax

s.groupby(level=[0, 2]).idxmax()

A  Four      (A, c, Four)
   One        (A, d, One)
   Three    (A, c, Three)
   Two        (A, d, Two)
B  Four      (B, d, Four)
   One        (B, d, One)
   Three    (B, c, Three)
   Two        (B, b, Two)
C  Four      (C, b, Four)
   One        (C, a, One)
   Three    (C, a, Three)
   Two        (C, e, Two)
D  Four      (D, b, Four)
   One        (D, e, One)
   Three    (D, b, Three)
   Two        (D, c, Two)
E  Four      (E, c, Four)
   One        (E, a, One)
   Three    (E, c, Three)
   Two        (E, a, Two)
dtype: object

我希望每个中每个的数字位置。

我可以通过awesome answers to this question

获取数字位置
s.groupby(level=[0, 2]).idxmax().apply(lambda x: s.index.get_loc(x))

A  Four     11
   One      12
   Three    10
   Two      13
B  Four     35
   One      32
   Three    30
   Two      25
C  Four     67
   One      60
   Three    62
   Two      77
D  Four     47
   One      56
   Three    46
   Two      49
E  Four     91
   One      80
   Three    90
   Two      81
dtype: int64

但我想要这个:

A  Four     2
   One      3
   Three    2
   Two      3
B  Four     3
   One      3
   Three    2
   Two      1
C  Four     1
   One      0
   Three    0
   Two      4
D  Four     1
   One      4
   Three    1
   Two      2
E  Four     2
   One      0
   Three    2
   Two      0
dtype: int64

2 个答案:

答案 0 :(得分:6)

好吧,我终于有了一个解决方案,它使用NumPy的重塑方法,然后沿着其中一个轴操作,给我们argmax。我不确定这是否优雅,但我希望在性能方面会很好。另外,我假设用于多索引数据的pandas Series具有常规格式,即每个级别保持所有索引的元素数量。

这是实施 -

L0,L1,L2 = s.index.levels[:3]
IDs = s.sortlevel().values.reshape(-1,len(L0),len(L1),len(L2)).argmax(2)
sOut = pd.Series(IDs.ravel(),pd.MultiIndex.from_product([L0,L2]))

时间(pir的补充)

enter image description here

答案 1 :(得分:4)

我就这样做了:

s.groupby(level=[0, 2]).apply(lambda x: x.index.get_loc(x.idxmax()))

A  Four     2
   One      3
   Three    2
   Two      3
B  Four     3
   One      3
   Three    2
   Two      1
C  Four     1
   One      0
   Three    0
   Two      4
D  Four     1
   One      4
   Three    1
   Two      2
E  Four     2
   One      0
   Three    2
   Two      0
dtype: int64