整齐的模型参数测试方法

时间:2016-09-16 16:18:55

标签: r dplyr tidyr broom

我想比较使用相同预测变量但不同模型参数的一组模型的模型性能。这似乎是使用broom来创建整洁输出的地方,但我无法弄明白。 这里有一些非工作代码可以帮助我们思考:

seq(1:10) %>%
do(fit = knn(train_Market, test_Market, train_Direction, k=.), score = mean(fit==test_Direction)) %>%
tidy()

对于更多背景信息,这是我们试图整合的ISLR实验室的一部分。您可以在此处查看整个实验室:https://github.com/AmeliaMN/tidy-islr/blob/master/lab3/lab3.Rmd

[更新:可重现的例子]这里很难做出一个最小的例子,因为在模型拟合之前需要进行数据争论,但这应该是可重现的:

library(ISLR)
library(dplyr)

train = Smarket %>%
  filter(Year < 2005)
test = Smarket %>%
  filter(Year >= 2005)

train_Market = train %>%
  select(Lag1, Lag2)
test_Market = test %>%
  select(Lag1, Lag2)

train_Direction = train %>%
  select(Direction) %>%
  .$Direction 

set.seed(1)
knn_pred = knn(train_Market, test_Market, train_Direction, k=1)
mean(knn_pred==test_Direction)

knn_pred = knn(train_Market, test_Market, train_Direction, k=3)
mean(knn_pred==test_Direction)

knn_pred = knn(train_Market, test_Market, train_Direction, k=4)
mean(knn_pred==test_Direction)

1 个答案:

答案 0 :(得分:3)

由于你的每个knn(和oracle)的输出都是一个向量,这对于tidyr的obj()(与purrr的unnestmap结合是一个很好的例子:

rep_along

然后将library(class) library(purrr) library(tidyr) set.seed(1) predictions <- data_frame(k = 1:5) %>% unnest(prediction = map(k, ~ knn(train_Market, test_Market, train_Direction, k = .))) %>% mutate(oracle = rep_along(prediction, test_Direction)) 变量组织为:

predictions

可以很容易地总结出来:

# A tibble: 1,260 x 3
       k prediction oracle
   <int>     <fctr> <fctr>
1      1         Up     Up
2      1       Down     Up
3      1         Up   Down
4      1         Up     Up
5      1         Up     Up
6      1       Down     Up
7      1       Down   Down
8      1       Down     Up
9      1       Down     Up
10     1         Up     Up
# ... with 1,250 more rows

同样,你不需要扫帚,因为每个输出都是一个因素,但如果它是一个模型,你可以使用扫帚的predictions %>% group_by(k) %>% summarize(accuracy = mean(prediction == oracle)) tidy,然后以类似的方式取消它。 / p>

这种方法的一个重要方面是它对许多参数组合很灵活,通过将它们与tidyr的augment(或crossing)相结合,并使用expand.grid将函数应用于每个参数行。例如,您可以在invoke_rows旁边尝试l的变体:

k

返回:

crossing(k = 2:5, l = 0:1) %>%
  invoke_rows(knn, ., train = train_Market, test = test_Market, cl = train_Direction) %>%
  unnest(prediction = .out) %>%
  mutate(oracle = rep_along(prediction, test_Direction)) %>%
  group_by(k, l) %>%
  summarize(accuracy = mean(prediction == oracle))