我有一个DataFrame,索引中有两列 - 一个是标签,另一个是时间序列周期。我想获得时间序列中每一行的前一行。但是我不能使用DataFrame.shift(),因为索引中有2列,并且移位混合了标签。
#Desired behavior: each 'x' row needs its prev value, each 'y' row needs
#its prev value, etc. DON'T put the 'y' row's prev value on the 'x' row.
#Have to respect both columns on the index when shifting.
x = pandas.DataFrame({ 'label' : [ 'x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z' ],
'period' : [ 1, 1, 1, 2, 2, 2, 3, 3, 3 ],
'value' : [ '1st x', '1st y', '1st z', '2nd x', '2nd y', '2nd z', '3rd x', '3rd y', '3rd z' ]})
x.set_index(['label', 'period'], inplace=True)
#That looks like:
>>> x
value
label period
x 1 1st x
y 1 1st y
z 1 1st z
x 2 2nd x
y 2 2nd y
z 2 2nd z
x 3 3rd x
y 3 3rd y
z 3 3rd z
#I can't use x.shift(1) because that mixes the 'x' and 'y' values:
>>> x.shift(1)
value
label period
x 1 NaN
y 1 1st x ###WRONG! should be NaN
z 1 1st y ###WRONG! Should be Nan
x 2 1st z ###WRONG!!! This should be "1st x'
y 2 2nd x ###Wrong!! Should be '1st y'
z 2 2nd y ###Wrong!! Should be '1st z'
x 3 2nd z ###Wrong!! Should be '2nd x'
y 3 3rd x #WRONG! should be '2nd y'
z 3 3rd y #WRONG! should be '2nd z'
如何为每一行获取正确的prev行?
答案 0 :(得分:3)
如果您groupby
在第一个索引级别,那么shift
可以按预期工作:
In [42]:
x.groupby(level='label').shift()
Out[42]:
value
label period
x 1 NaN
y 1 NaN
z 1 NaN
x 2 1st x
y 2 1st y
z 2 1st z
x 3 2nd x
y 3 2nd y
z 3 2nd z
答案 1 :(得分:0)
另外,如果您想要更“可读”的格式,可以使用DataFrame.unstack
unstacked = df.unstack(level=0)
changes = unstacked.diff()
对于以下数据:
label period value
x 1 1
y 1 0
z 1 3
x 2 2
y 2 1
z 2 2
x 3 1
y 3 0
z 3 0
产地:
value
label x y z
period
1 NaN NaN NaN
2 1.0 1.0 -1.0
3 -1.0 -1.0 -2.0