我有两个Pandas DataFrame,主要是不同的数据:
err_df =
2 3 11 13 14 16
4 122.153000 56.3023 21.2722 71.79590 81.63212 NaN
8 70.967800 19.5768 69.9780 21.11050 116.89777 NaN
12 70.659100 19.5768 NaN 39.46288 70.62480 70.597850
16 19.237067 NaN NaN 18.93980 18.60660 19.104767
20 19.349440 NaN NaN 19.38080 NaN 36.785533
24 NaN NaN NaN 17.92060 NaN NaN
temp_df =
2 3 11 13 14 16
4 89.5488 122.153 121.957 122.153 122.153 NaN
8 89.5488 122.153 121.957 122.153 122.153 NaN
12 89.5488 122.153 NaN 122.153 122.153 122.153
16 89.5488 NaN NaN 122.153 122.153 122.153
20 89.5488 NaN NaN 122.153 NaN 122.153
24 NaN NaN NaN 122.153 NaN NaN
我想计算两个DataFrames列之间的均方根误差(RMSE),并将结果存储在第三个DataFrame中。我知道如何计算单个列的RMSE,让我们说2
:
print(((err_df[2] - temp_df[2])**2).mean()**0.5)
result = 48.2427158719
NaN
个字符也没有问题 - 他们被忽略了,这是一种解脱,因为我认为使用sklearn
mean_square_error
函数给出了此错误ValueError: Array contains NaN or infinity.
基本上,我希望能够动态地计算RMSE值和#34;而且每次运行主程序时都不必更改列
保存结果的第三个DataFrame应如下所示:
df3 =
2 3 11 13 14 16
0 48.2427158719 "RMSE" "RMSE" "RMSE" "RMSE" "RMSE"
我如何做到这一点?
任何帮助表示赞赏。在此先感谢:)
(使用Ubuntu 14.04 32位VM和Python 2.7)
答案 0 :(得分:1)
((err_df-temp_df)**2).mean(0)**0.5
Out[318]:
2 48.242716
3 91.978382
11 80.122548
13 92.792388
14 61.332234
16 82.793873
dtype: float64