如何在熊猫的后续行之间进行操作?

时间:2019-01-28 21:03:00

标签: python pandas

我是机器学习的新手,我不知道如何执行以下任务:我需要减去属于同一列的两个后续行,但前提是“ ID”列的值相同并且这些行的“年”列值是连续的。

该表的示例:

           ID  Year  Revenues
0   180310781  2008  1730.119
1   180310781  2009  1710.073
2   180310781  2010  1653.428
3   180310781  2011  1608.061
4   180310781  2012   1350.84
12  756460796  2008   1061.78
13  756460796  2009  1045.337
14  756460796  2010         0
15  756460796  2011   675.333
16  756460796  2012   671.717 

期望的结果是在新的列中显示0(或者Nan,我不在乎),因为它是观察的第一年,而在第二行中显示1710.073-1730.119,依此类推,直到相同的ID已用尽。

2 个答案:

答案 0 :(得分:1)

可以使用Series创建布尔值.shift来验证条件,然后将差值分配给SeriesTrue的行:

s = (df.ID == df.ID.shift(1)) & (df.Year == df.Year.shift(1)+1)
df.loc[s, 'Diff'] = df.Revenues.diff()[s]

           ID  Year  Revenues      Diff
0   180310781  2008  1730.119       NaN
1   180310781  2009  1710.073   -20.046
2   180310781  2010  1653.428   -56.645
3   180310781  2011  1608.061   -45.367
4   180310781  2012  1350.840  -257.221
12  756460796  2008  1061.780       NaN
13  756460796  2009  1045.337   -16.443
14  756460796  2010     0.000 -1045.337
15  756460796  2011   675.333   675.333
16  756460796  2012   671.717    -3.616

答案 1 :(得分:1)

df['Diff'] = df.groupby('ID', group_keys=False) \
                 .apply(lambda x: x['Revenues'].diff())

输出

          ID  Year  Revenues      Diff
0  180310781  2008  1730.119       NaN
1  180310781  2009  1710.073   -20.046
2  180310781  2010  1653.428   -56.645
3  180310781  2011  1608.061   -45.367
4  180310781  2012  1350.840  -257.221
5  756460796  2008  1061.780       NaN
6  756460796  2009  1045.337   -16.443
7  756460796  2010     0.000 -1045.337
8  756460796  2011   675.333   675.333
9  756460796  2012   671.717    -3.616
相关问题