我正在尝试添加一个新列,其中包含一个列的第一个值与另一列的最后一个值之间的差 我正在使用此命令
df['diff']=df.groupby(['T_Id'])['EndMeterReading'].max()-df['StartMeterReading'].min()
但是它用NaN
我怎么能达到我想要的结果。
原始DataFrame
+------+-------+--------------+------------+
| D_Id | T_Id | StartReading | EndReading |
+------+-------+--------------+------------+
| 1 | 4716a | 4323.17 | 4324.8 |
| 1 | 4716a | 4324.96 | 4325.34 |
| 1 | 4716a | 4326.47 | 4327.22 |
| 1 | 4716a | 4327.4 | 4328.43 |
| 1 | 4716a | 4328.85 | 4330.73 |
| 1 | 4716b | 4346.65 | 4347.62 |
| 1 | 4716b | 4347.67 | 4349.88 |
| 1 | 4716b | 4351.62 | 4351.83 |
| 1 | 4716b | 4352.88 | 4354.32 |
| 1 | 4716b | 4354.93 | 4355.14 |
| 1 | 4716b | 4355.2 | 4355.82 |
| 1 | 4716b | 4356.91 | 4357.37 |
| 1 | 4716b | 4357.74 | 4358.26 |
| 1 | 4716b | 4359.89 | 4360.46 |
| 1 | 4716b | 4360.61 | 4361.43 |
| 1 | 4716b | 4361.47 | 4362.11 |
| 1 | 4716b | 4362.88 | 4368.49 |
| 1 | 4716b | 4368.94 | 4369.78 |
| 1 | 4716b | 4370.91 | 4371.25 |
| 1 | 4716b | 4372.67 | 4372.77 |
+------+-------+--------------+------------+
所需的输出:
+------+-------+--------------+------------+------------------+
| D_Id | T_Id | StartReading | EndReading | Diff |
+------+-------+--------------+------------+------------------+
| 1 | 4716a | 4323.17 | 4324.8 | 7.56 |
| 1 | 4716a | 4324.96 | 4325.34 | 7.56 |
| 1 | 4716a | 4326.47 | 4327.22 | 7.56 |
| 1 | 4716a | 4327.4 | 4328.43 | 7.56 |
| 1 | 4716a | 4328.85 | 4330.73 | 7.56 |
| 1 | 4716b | 4346.65 | 4347.62 | 26.12 |
| 1 | 4716b | 4347.67 | 4349.88 | 26.12 |
| 1 | 4716b | 4351.62 | 4351.83 | 26.12 |
| 1 | 4716b | 4352.88 | 4354.32 | 26.12 |
| 1 | 4716b | 4354.93 | 4355.14 | 26.12 |
| 1 | 4716b | 4355.2 | 4355.82 | 26.12 |
| 1 | 4716b | 4356.91 | 4357.37 | 26.12 |
| 1 | 4716b | 4357.74 | 4358.26 | 26.12 |
| 1 | 4716b | 4359.89 | 4360.46 | 26.12 |
| 1 | 4716b | 4360.61 | 4361.43 | 26.12 |
| 1 | 4716b | 4361.47 | 4362.11 | 26.12 |
| 1 | 4716b | 4362.88 | 4368.49 | 26.12 |
| 1 | 4716b | 4368.94 | 4369.78 | 26.12 |
| 1 | 4716b | 4370.91 | 4371.25 | 26.12 |
| 1 | 4716b | 4372.67 | 4372.77 | 26.12 |
+------+-------+--------------+------------+------------------+
答案 0 :(得分:2)
将GroupBy.transform
与max
和min
函数一起用于Series
,其大小与原始DataFrame
相同,因此可以正确减去:
df['diff']= (df.groupby('T_Id')['EndReading'].transform('max')-
df.groupby('T_Id')['StartReading'].transform('min'))
print (df)
D_Id T_Id StartReading EndReading diff
0 1 4716a 4323.17 4324.80 7.56
1 1 4716a 4324.96 4325.34 7.56
2 1 4716a 4326.47 4327.22 7.56
3 1 4716a 4327.40 4328.43 7.56
4 1 4716a 4328.85 4330.73 7.56
5 1 4716b 4346.65 4347.62 26.12
6 1 4716b 4347.67 4349.88 26.12
7 1 4716b 4351.62 4351.83 26.12
8 1 4716b 4352.88 4354.32 26.12
9 1 4716b 4354.93 4355.14 26.12
10 1 4716b 4355.20 4355.82 26.12
11 1 4716b 4356.91 4357.37 26.12
12 1 4716b 4357.74 4358.26 26.12
13 1 4716b 4359.89 4360.46 26.12
14 1 4716b 4360.61 4361.43 26.12
15 1 4716b 4361.47 4362.11 26.12
16 1 4716b 4362.88 4368.49 26.12
17 1 4716b 4368.94 4369.78 26.12
18 1 4716b 4370.91 4371.25 26.12
19 1 4716b 4372.67 4372.77 26.12
答案 1 :(得分:0)
使用last
查找groupby
和merge
,然后df2 = df.groupby(['T_Id']).agg({'StartReading' : 'first', 'EndReading' : 'last'}).reset_index(0)
df2['Diff'] = df2['EndReading'] - df2['StartReading']
df.merge(df2[['T_Id', 'Diff']], how='left', on='T_Id')
返回原始df
{{1}}