我对房价数据集进行了OLS回归分析。我计算了每个房子的误差与预测值的关系。我在数据框中有一个列,其中包含每个房屋所属的城镇。我想计算每个城镇的MAPE。我的数据框看起来像这样:
HomePr Error Town
1390 0.40093 Clarkvile
2010 0.348902 Petersburg
2393 0.348902 Petersburg
2000 0.348902 Clarkvile
7030 0.348902 Pleasant Place
4025 0.348902 Petersburg
4000 0.348902 Millerstown
2086 0.348902 Pleasant Place
6058 0.348902 Schneider
2000 0.348902 Jebtown
我想按Town
计算MAPE。因此,我的第一步是获取唯一Towns
的列表,然后使用共享该组的所有Errors
计算MAPE。然后,我想要一个新列DF$Mape
,它将仅使用每组Town
中的房屋给我MAPE。
我不确定如何解决这个问题。寻找建议。
答案 0 :(得分:0)
喜欢这个?
library(dplyr); library(tibble)
mape <- function(actual, forecasted){
x = 0.1*((actual - forecasted)/actual)*100
return(x)
}
tibble(
HomePr = c(1390, 2010, 2393, 2000, 7030, 4025, 4000,
2086, 6058, 2000),
Error = c(0.40093, 0.348902, 0.348902, 0.348902, 0.348902,
0.348902, 0.348902, 0.348902, 0.348902, 0.348902),
Town = c("Clarkvile", "Petersburg", "Petersburg", "Clarkvile",
"Pleasant Place", "Petersburg", "Millerstown", "Pleasant Place",
"Schneider", "Jebtown")
) %>%
group_by(Town) %>%
summarise(means_pr = mean(HomePr),
means_err = mean(Error)) %>%
mutate(Mape = mape(means_pr, means_err))
结果:
# A tibble: 6 x 4
Town means_pr means_err Mape
<chr> <dbl> <dbl> <dbl>
1 Clarkvile 1695.000 0.374916 9.997788
2 Jebtown 2000.000 0.348902 9.998255
3 Millerstown 4000.000 0.348902 9.999128
4 Petersburg 2809.333 0.348902 9.998758
5 Pleasant Place 4558.000 0.348902 9.999235
6 Schneider 6058.000 0.348902 9.999424
更新:根据以下评论,(实际)数据集有Town
作为因素。这可以简单地转换为字符,df <- df %>% mutate(Town = as.character(Town)
,其中df
是数据框。