我有这个数据框:
df = pd.DataFrame([{ "state": "CA", "total":2, "week": 10 },{ "state": "UT", "total": 7, "week": 10 },{ "state": "CA", "total": 14, "week": 11 },{ "state": "UT", "total":18, "week": 11 },{ "state": "CA", "total": 21, "week": 12 },{ "state": "UT", "total": 30, "week": 12 }])
total
字段是累积的,我想按周获取差异。所以我想结束这个:
state,total,week,diff
CA,2,10,NaN
UT,7,10,NaN
CA,14,11,12
UT,18,11,11
CA,21,12,7
UT,30,12,12
如何从这里到达那里?我可以通过遍历行来做到这一点,但是我不知道从何处开始在熊猫中做到这一点。
答案 0 :(得分:4)
您可以这样做
df['diff'] = df.groupby('state')['total'].diff()
df
出局:
state total week diff
0 CA 2 10 NaN
1 UT 7 10 NaN
2 CA 14 11 12.0
3 UT 18 11 11.0
4 CA 21 12 7.0
5 UT 30 12 12.0
自pandas 0.24
起,您可以使用nullable int types
,但这并不常用
df['diff'] = df.groupby('state')['total'].diff().astype(pd.Int64Dtype())
df
出局:
state total week diff
0 CA 2 10 <NA>
1 UT 7 10 <NA>
2 CA 14 11 12
3 UT 18 11 11
4 CA 21 12 7
5 UT 30 12 12