我是机器学习的新手,我不知道如何执行以下任务:我需要减去属于同一列的两个后续行,但前提是“ ID”列的值相同并且这些行的“年”列值是连续的。
该表的示例:
ID Year Revenues
0 180310781 2008 1730.119
1 180310781 2009 1710.073
2 180310781 2010 1653.428
3 180310781 2011 1608.061
4 180310781 2012 1350.84
12 756460796 2008 1061.78
13 756460796 2009 1045.337
14 756460796 2010 0
15 756460796 2011 675.333
16 756460796 2012 671.717
期望的结果是在新的列中显示0(或者Nan,我不在乎),因为它是观察的第一年,而在第二行中显示1710.073-1730.119,依此类推,直到相同的ID已用尽。
答案 0 :(得分:1)
可以使用Series
创建布尔值.shift
来验证条件,然后将差值分配给Series
为True
的行:
s = (df.ID == df.ID.shift(1)) & (df.Year == df.Year.shift(1)+1)
df.loc[s, 'Diff'] = df.Revenues.diff()[s]
ID Year Revenues Diff
0 180310781 2008 1730.119 NaN
1 180310781 2009 1710.073 -20.046
2 180310781 2010 1653.428 -56.645
3 180310781 2011 1608.061 -45.367
4 180310781 2012 1350.840 -257.221
12 756460796 2008 1061.780 NaN
13 756460796 2009 1045.337 -16.443
14 756460796 2010 0.000 -1045.337
15 756460796 2011 675.333 675.333
16 756460796 2012 671.717 -3.616
答案 1 :(得分:1)
df['Diff'] = df.groupby('ID', group_keys=False) \
.apply(lambda x: x['Revenues'].diff())
输出
ID Year Revenues Diff
0 180310781 2008 1730.119 NaN
1 180310781 2009 1710.073 -20.046
2 180310781 2010 1653.428 -56.645
3 180310781 2011 1608.061 -45.367
4 180310781 2012 1350.840 -257.221
5 756460796 2008 1061.780 NaN
6 756460796 2009 1045.337 -16.443
7 756460796 2010 0.000 -1045.337
8 756460796 2011 675.333 675.333
9 756460796 2012 671.717 -3.616