Pandas DataFrame:存储均方根误差数据

时间:2017-11-12 19:20:06

标签: python pandas dataframe

我有两个Pandas DataFrame,主要是不同的数据:

err_df =
            2        3        11        13         14         16
4   122.153000  56.3023  21.2722  71.79590   81.63212        NaN   
8    70.967800  19.5768  69.9780  21.11050  116.89777        NaN   
12   70.659100  19.5768      NaN  39.46288   70.62480  70.597850
16   19.237067      NaN      NaN  18.93980   18.60660  19.104767
20   19.349440      NaN      NaN  19.38080        NaN  36.785533
24         NaN      NaN      NaN  17.92060        NaN        NaN 

temp_df =
         2        3        11       13       14       16
4   89.5488  122.153  121.957  122.153  122.153      NaN
8   89.5488  122.153  121.957  122.153  122.153      NaN
12  89.5488  122.153      NaN  122.153  122.153  122.153
16  89.5488      NaN      NaN  122.153  122.153  122.153
20  89.5488      NaN      NaN  122.153      NaN  122.153
24      NaN      NaN      NaN  122.153      NaN      NaN  

我想计算两个DataFrames列之间的均方根误差(RMSE),并将结果存储在第三个DataFrame中。我知道如何计算单个列的RMSE,让我们说2

print(((err_df[2] - temp_df[2])**2).mean()**0.5)
result = 48.2427158719

NaN个字符也没有问题 - 他们被忽略了,这是一种解脱,因为我认为使用sklearn mean_square_error函数给出了此错误ValueError: Array contains NaN or infinity.
基本上,我希望能够动态地计算RMSE值和#34;而且每次运行主程序时都不必更改列 保存结果的第三个DataFrame应如下所示:

df3 =
              2        3        11        13         14         16
0   48.2427158719  "RMSE"    "RMSE"    "RMSE"     "RMSE"     "RMSE"  

我如何做到这一点?
任何帮助表示赞赏。在此先感谢:)
(使用Ubuntu 14.04 32位VM和Python 2.7)

1 个答案:

答案 0 :(得分:1)

((err_df-temp_df)**2).mean(0)**0.5
Out[318]: 
2     48.242716
3     91.978382
11    80.122548
13    92.792388
14    61.332234
16    82.793873
dtype: float64