按组计算行上的MAPE

时间:2017-06-12 19:43:03

标签: r statistics grouping

我对房价数据集进行了OLS回归分析。我计算了每个房子的误差与预测值的关系。我在数据框中有一个列,其中包含每个房屋所属的城镇。我想计算每个城镇的MAPE。我的数据框看起来像这样:

HomePr   Error      Town 

1390      0.40093  Clarkvile   
2010      0.348902 Petersburg  
2393      0.348902 Petersburg  
2000      0.348902 Clarkvile  
7030      0.348902 Pleasant Place  
4025      0.348902 Petersburg  
4000      0.348902 Millerstown 
2086      0.348902 Pleasant Place  
6058      0.348902 Schneider  
2000      0.348902 Jebtown 

我想按Town计算MAPE。因此,我的第一步是获取唯一Towns的列表,然后使用共享该组的所有Errors计算MAPE。然后,我想要一个新列DF$Mape,它将仅使用每组Town中的房屋给我MAPE。

我不确定如何解决这个问题。寻找建议。

1 个答案:

答案 0 :(得分:0)

喜欢这个?

library(dplyr); library(tibble)

mape <- function(actual, forecasted){
  x = 0.1*((actual - forecasted)/actual)*100
  return(x)
}
tibble(
  HomePr = c(1390, 2010, 2393, 2000, 7030, 4025, 4000,
             2086, 6058, 2000),
  Error = c(0.40093, 0.348902, 0.348902, 0.348902, 0.348902,
            0.348902, 0.348902, 0.348902, 0.348902, 0.348902),
  Town = c("Clarkvile", "Petersburg", "Petersburg", "Clarkvile",  
           "Pleasant Place", "Petersburg", "Millerstown", "Pleasant Place",
           "Schneider", "Jebtown")
) %>% 
  group_by(Town) %>% 
  summarise(means_pr = mean(HomePr),
            means_err = mean(Error)) %>% 
  mutate(Mape = mape(means_pr, means_err))

结果:

# A tibble: 6 x 4
            Town means_pr means_err     Mape
           <chr>    <dbl>     <dbl>    <dbl>
1      Clarkvile 1695.000  0.374916 9.997788
2        Jebtown 2000.000  0.348902 9.998255
3    Millerstown 4000.000  0.348902 9.999128
4     Petersburg 2809.333  0.348902 9.998758
5 Pleasant Place 4558.000  0.348902 9.999235
6      Schneider 6058.000  0.348902 9.999424

更新:根据以下评论,(实际)数据集有Town作为因素。这可以简单地转换为字符,df <- df %>% mutate(Town = as.character(Town),其中df是数据框。