我有以下内容,
import pandas as pd
data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 21],
['AAA','2019-02-01', 30], ['AAA','2019-02-02', 45],
['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]
dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])
NAME TIMESTAMP VALUE
0 AAA 2019-01-01 10
1 AAA 2019-01-02 21
2 AAA 2019-02-01 30
3 AAA 2019-02-02 45
4 BBB 2019-01-01 50
5 BBB 2019-01-02 60
6 BBB 2019-02-01 70
7 BBB 2019-02-02 80
我想生成一个新列,该列列出当前行与上一行的VALUE列的差异。
所以输出看起来像这样
NAME TIMESTAMP VALUE DIFF
0 AAA 2019-01-01 10
1 AAA 2019-01-02 21 11
2 AAA 2019-02-01 30 9
3 AAA 2019-02-02 45 15
4 BBB 2019-01-01 50
5 BBB 2019-01-02 60 10
6 BBB 2019-02-01 70 10
7 BBB 2019-02-02 80 10
致谢。
答案 0 :(得分:1)
您可以这样做:
dfx['DIFF'] = dfx.groupby('NAME')['VALUE'].apply(lambda x: x - x.shift()).fillna(0)
print(dfx)
输出
NAME TIMESTAMP VALUE diff
0 AAA 2019-01-01 10 0.0
1 AAA 2019-01-02 21 11.0
2 AAA 2019-02-01 30 9.0
3 AAA 2019-02-02 45 15.0
4 BBB 2019-01-01 50 0.0
5 BBB 2019-01-02 60 10.0
6 BBB 2019-02-01 70 10.0
7 BBB 2019-02-02 80 10.0
答案 1 :(得分:1)
一个更简单的解决方案:
dfx.groupby('NAME').diff()