我尝试了一些有条件的替换,但是那没用...
df.loc[df['Temperature1'] > 50, 'Temperature'] = 23
我已经尝试过了,但这会将所有大于50的元素更改为23。 但我想比较两行,并检查差异是否大于10,而不仅仅是我必须替换。.
答案 0 :(得分:0)
编辑:使用rolling window添加了示例(另请参见:window functions)
您可以使用shift()将上一行和下一行的值放在中间行。
import pandas as pd
df = pd.DataFrame({'Temperature': [10,30,20,40,50]})
df['upper_row'] = df['Temperature'].shift()
df['lower_row'] = df['Temperature'].shift(-1)
print(df)
结果
Temperature upper_row lower_row
0 10 NaN 30.0
1 30 10.0 20.0
2 20 30.0 40.0
3 40 20.0 50.0
4 50 40.0 NaN
然后一行中有三个值,通常可以将它们减去,计算平均值,比较它们,等等。
df['difference'] = (df['Temperature'] - df['upper_row']).abs()
df['mean'] = (df['upper_row'] + df['lower_row'])/2
print(df)
结果
Temperature upper_row lower_row difference mean
0 10 NaN 30.0 NaN NaN
1 30 10.0 20.0 20.0 15.0
2 20 30.0 40.0 10.0 35.0
3 40 20.0 50.0 20.0 35.0
4 50 40.0 NaN 10.0 NaN
您可以替换Temperature
df['Temperature'][ df['difference']>10 ] = df['mean']
print(df)
结果
Temperature upper_row lower_row difference mean
0 10 NaN 30.0 NaN NaN
1 15 10.0 20.0 20.0 15.0
2 20 30.0 40.0 10.0 35.0
3 35 20.0 50.0 20.0 35.0
4 50 40.0 NaN 10.0 NaN
完整示例:
import pandas as pd
df = pd.DataFrame({'Temperature': [10,30,20,40,50]})
df['upper_row'] = df['Temperature'].shift()
df['lower_row'] = df['Temperature'].shift(-1)
print(df)
df['difference'] = (df['Temperature'] - df['upper_row']).abs()
df['mean'] = (df['upper_row'] + df['lower_row'])/2
print(df)
df['Temperature'][ df['difference']>10 ] = df['mean']
print(df)
编辑:您还可以使用rolling window处理两到三个连续的行。查看代码中的注释。
import pandas as pd
df = pd.DataFrame({'Temperature': [10,30,20,40,50]})
# work with two consecutive rows and result assign to last row
rw2 = df['Temperature'].rolling(2)
df['difference'] = rw2.apply(lambda rows:abs(rows[1] - rows[0]), raw=True)
# work with three consecutive rows and result assign to middle/center row
rw3 = df['Temperature'].rolling(3, center=True)
df['mean'] = rw3.apply(lambda rows:(rows[0] + rows[2])/2, raw=True)
print(df)
df['Temperature'][ df['difference']>10 ] = df['mean']
print(df)