熊猫:计算行之间的差异

时间:2020-11-04 00:11:54

标签: python pandas

我有这个数据框:

df = pd.DataFrame([{ "state": "CA", "total":2, "week": 10 },{ "state": "UT", "total": 7, "week": 10 },{ "state": "CA", "total": 14, "week": 11 },{ "state": "UT", "total":18, "week": 11 },{ "state": "CA", "total": 21, "week": 12 },{ "state": "UT", "total": 30, "week": 12 }])

total字段是累积的,我想按周获取差异。所以我想结束这个:

state,total,week,diff
CA,2,10,NaN
UT,7,10,NaN
CA,14,11,12
UT,18,11,11
CA,21,12,7
UT,30,12,12

如何从这里到达那里?我可以通过遍历行来做到这一点,但是我不知道从何处开始在熊猫中做到这一点。

1 个答案:

答案 0 :(得分:4)

您可以这样做

df['diff'] = df.groupby('state')['total'].diff()
df

出局:

  state  total  week  diff
0    CA      2    10   NaN
1    UT      7    10   NaN
2    CA     14    11  12.0
3    UT     18    11  11.0
4    CA     21    12   7.0
5    UT     30    12  12.0

pandas 0.24起,您可以使用nullable int types,但这并不常用

df['diff'] = df.groupby('state')['total'].diff().astype(pd.Int64Dtype())
df

出局:

  state  total  week  diff
0    CA      2    10  <NA>
1    UT      7    10  <NA>
2    CA     14    11    12
3    UT     18    11    11
4    CA     21    12     7
5    UT     30    12    12