在多索引级别分区pandas .diff()

时间:2015-04-29 13:04:52

标签: pandas multi-index

我的问题涉及在多索引级别的分区中调用.diff()

在下面的示例中输出第一个

df.diff()是

               values
Greek English        
alpha a           NaN
      b             2
      c             2
      d             2
beta  e            11
      f             1
      g             1
      h             1

但我希望它是:

               values
Greek English        
alpha a           NaN
      b             2
      c             2
      d             2
beta  e            NaN
      f             1
      g             1
      h             1

这是一个解决方案,使用循环,但我想我可以避免循环

import pandas as pd
import numpy as np

df = pd.DataFrame({'values' : [1.,3.,5.,7.,18.,19.,20.,21.],
   'Greek' : ['alpha', 'alpha', 'alpha', 'alpha','beta','beta','beta','beta'],
   'English' : ['a', 'b', 'c', 'd','e','f','g','h']})

df.set_index(['Greek','English'],inplace =True)
print df

# (1.) This is not the type of .diff() i want.
# I need it to respect the level='Greek' and restart   
print df.diff()


# this is one way to achieve my desired result but i have to think
# there is a way that does not involve the need to loop.
idx = pd.IndexSlice
for greek_letter in df.index.get_level_values('Greek').unique():
    df.loc[idx[greek_letter,:]]['values'] = df.loc[idx[greek_letter,:]].diff()

print df

1 个答案:

答案 0 :(得分:10)

只需level=0 In [179]: df.groupby(level=0)['values'].diff() Out[179]: Greek English alpha a NaN b 2 c 2 d 2 beta e NaN f 1 g 1 h 1 dtype: float64 或'希腊'如果您愿意,那么您可以在值上调用groupby

adt-bundle-windows-x86-20140702