Pandas获得具有多个索引的上一行时间序列

时间:2016-06-09 13:39:32

标签: python pandas time-series

我有一个DataFrame,索引中有两列 - 一个是标签,另一个是时间序列周期。我想获得时间序列中每一行的前一行。但是我不能使用DataFrame.shift(),因为索引中有2列,并且移位混合了标签。

#Desired behavior: each 'x' row needs its prev value, each 'y' row needs
#its prev value, etc. DON'T put the 'y' row's prev value on the 'x' row.
#Have to respect both columns on the index when shifting.
x = pandas.DataFrame({ 'label' : [ 'x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z' ], 
     'period' : [ 1, 1, 1, 2, 2, 2, 3, 3, 3 ],
     'value' : [ '1st x', '1st y', '1st z', '2nd x', '2nd y', '2nd z', '3rd x', '3rd y', '3rd z' ]})
x.set_index(['label', 'period'], inplace=True)

#That looks like:
>>> x
             value
label period       
x     1       1st x
y     1       1st y
z     1       1st z
x     2       2nd x
y     2       2nd y
z     2       2nd z
x     3       3rd x
y     3       3rd y
z     3       3rd z

#I can't use x.shift(1) because that mixes the 'x' and 'y' values:
>>> x.shift(1)
              value
label period       
x     1         NaN
y     1       1st x ###WRONG! should be NaN
z     1       1st y ###WRONG! Should be Nan
x     2       1st z  ###WRONG!!! This should be "1st x'
y     2       2nd x  ###Wrong!! Should be '1st y'
z     2       2nd y ###Wrong!! Should be '1st z'
x     3       2nd z  ###Wrong!! Should be '2nd x'
y     3       3rd x  #WRONG! should be '2nd y'
z     3       3rd y #WRONG! should be '2nd z'

如何为每一行获取正确的prev行?

2 个答案:

答案 0 :(得分:3)

如果您groupby在第一个索引级别,那么shift可以按预期工作:

In [42]:
x.groupby(level='label').shift()

Out[42]:
              value
label period       
x     1         NaN
y     1         NaN
z     1         NaN
x     2       1st x
y     2       1st y
z     2       1st z
x     3       2nd x
y     3       2nd y
z     3       2nd z

答案 1 :(得分:0)

另外,如果您想要更“可读”的格式,可以使用DataFrame.unstack

unstacked = df.unstack(level=0)
changes = unstacked.diff()

对于以下数据:

label period  value    
x     1       1
y     1       0
z     1       3
x     2       2
y     2       1
z     2       2
x     3       1
y     3       0
z     3       0

产地:

    value
label   x   y   z
period          
1   NaN     NaN     NaN
2   1.0     1.0     -1.0
3   -1.0    -1.0    -2.0