熊猫groupby并减去一列的最后一个值与另一列的第一个值

时间:2019-12-05 08:03:29

标签: python pandas dataframe group-by

我正在尝试添加一个新列,其中包含一个列的第一个值与另一列的最后一个值之间的差 我正在使用此命令

df['diff']=df.groupby(['T_Id'])['EndMeterReading'].max()-df['StartMeterReading'].min()

但是它用NaN

填充了新列

我怎么能达到我想要的结果。

原始DataFrame

+------+-------+--------------+------------+
| D_Id | T_Id  | StartReading | EndReading |
+------+-------+--------------+------------+
|    1 | 4716a |      4323.17 |     4324.8 |
|    1 | 4716a |      4324.96 |    4325.34 |
|    1 | 4716a |      4326.47 |    4327.22 |
|    1 | 4716a |       4327.4 |    4328.43 |
|    1 | 4716a |      4328.85 |    4330.73 |
|    1 | 4716b |      4346.65 |    4347.62 |
|    1 | 4716b |      4347.67 |    4349.88 |
|    1 | 4716b |      4351.62 |    4351.83 |
|    1 | 4716b |      4352.88 |    4354.32 |
|    1 | 4716b |      4354.93 |    4355.14 |
|    1 | 4716b |       4355.2 |    4355.82 |
|    1 | 4716b |      4356.91 |    4357.37 |
|    1 | 4716b |      4357.74 |    4358.26 |
|    1 | 4716b |      4359.89 |    4360.46 |
|    1 | 4716b |      4360.61 |    4361.43 |
|    1 | 4716b |      4361.47 |    4362.11 |
|    1 | 4716b |      4362.88 |    4368.49 |
|    1 | 4716b |      4368.94 |    4369.78 |
|    1 | 4716b |      4370.91 |    4371.25 |
|    1 | 4716b |      4372.67 |    4372.77 |
+------+-------+--------------+------------+

所需的输出:

+------+-------+--------------+------------+------------------+
| D_Id | T_Id  | StartReading | EndReading |       Diff       |
+------+-------+--------------+------------+------------------+
|    1 | 4716a |      4323.17 |     4324.8 |             7.56 |
|    1 | 4716a |      4324.96 |    4325.34 |             7.56 |
|    1 | 4716a |      4326.47 |    4327.22 |             7.56 |
|    1 | 4716a |       4327.4 |    4328.43 |             7.56 |
|    1 | 4716a |      4328.85 |    4330.73 |             7.56 |
|    1 | 4716b |      4346.65 |    4347.62 |            26.12 |
|    1 | 4716b |      4347.67 |    4349.88 |            26.12 |
|    1 | 4716b |      4351.62 |    4351.83 |            26.12 |
|    1 | 4716b |      4352.88 |    4354.32 |            26.12 |
|    1 | 4716b |      4354.93 |    4355.14 |            26.12 |
|    1 | 4716b |       4355.2 |    4355.82 |            26.12 |
|    1 | 4716b |      4356.91 |    4357.37 |            26.12 |
|    1 | 4716b |      4357.74 |    4358.26 |            26.12 |
|    1 | 4716b |      4359.89 |    4360.46 |            26.12 |
|    1 | 4716b |      4360.61 |    4361.43 |            26.12 |
|    1 | 4716b |      4361.47 |    4362.11 |            26.12 |
|    1 | 4716b |      4362.88 |    4368.49 |            26.12 |
|    1 | 4716b |      4368.94 |    4369.78 |            26.12 |
|    1 | 4716b |      4370.91 |    4371.25 |            26.12 |
|    1 | 4716b |      4372.67 |    4372.77 |            26.12 |
+------+-------+--------------+------------+------------------+

2 个答案:

答案 0 :(得分:2)

GroupBy.transformmaxmin函数一起用于Series,其大小与原始DataFrame相同,因此可以正确减去:

df['diff']= (df.groupby('T_Id')['EndReading'].transform('max')-
             df.groupby('T_Id')['StartReading'].transform('min'))

print (df)
    D_Id   T_Id  StartReading  EndReading   diff
0      1  4716a       4323.17     4324.80   7.56
1      1  4716a       4324.96     4325.34   7.56
2      1  4716a       4326.47     4327.22   7.56
3      1  4716a       4327.40     4328.43   7.56
4      1  4716a       4328.85     4330.73   7.56
5      1  4716b       4346.65     4347.62  26.12
6      1  4716b       4347.67     4349.88  26.12
7      1  4716b       4351.62     4351.83  26.12
8      1  4716b       4352.88     4354.32  26.12
9      1  4716b       4354.93     4355.14  26.12
10     1  4716b       4355.20     4355.82  26.12
11     1  4716b       4356.91     4357.37  26.12
12     1  4716b       4357.74     4358.26  26.12
13     1  4716b       4359.89     4360.46  26.12
14     1  4716b       4360.61     4361.43  26.12
15     1  4716b       4361.47     4362.11  26.12
16     1  4716b       4362.88     4368.49  26.12
17     1  4716b       4368.94     4369.78  26.12
18     1  4716b       4370.91     4371.25  26.12
19     1  4716b       4372.67     4372.77  26.12

答案 1 :(得分:0)

使用last查找groupbymerge,然后df2 = df.groupby(['T_Id']).agg({'StartReading' : 'first', 'EndReading' : 'last'}).reset_index(0) df2['Diff'] = df2['EndReading'] - df2['StartReading'] df.merge(df2[['T_Id', 'Diff']], how='left', on='T_Id') 返回原始df

{{1}}