我为数据框中的每个组计算了不同的回归:
DF.L <- DF %>%
group_by(Channel) %>%
do(Fit = rlm(L ~ -1 + Y + I(Y^2), data = .))
我想将这组回归应用于另一个数据框。为此,我正在测试如何将其应用于同一数据框:
DF %>%
group_by(Channel) %>%
do({
Lfit <- predict(subset(DF.L, Channel == unique(.$Channel))$Fit, .)
data.frame(., Lfit)
})
glimpse(DF)
但我一直收到这个错误:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "list"
Calls: %>% ... do_.grouped_df -> eval -> eval -> predict -> predict
我做错了什么?
答案 0 :(得分:4)
使用内置的ChickWeight
数据:
library(dplyr)
library(MASS)
library(broom)
library(tidyr)
library(ggplot2)
head(ChickWeight)
weight Time Chick Diet 1 42 0 1 1 2 51 2 1 1 3 59 4 1 1 4 64 6 1 1 5 76 8 1 1 6 93 10 1 1
ChickWeight_models <- ChickWeight %>%
group_by(Diet) %>%
do(fit = MASS::rlm(weight ~ Time + I(Time^2), data = .))
ChickWeight_models
Source: local data frame [4 x 2] Groups: <by row> # A tibble: 4 x 2 Diet fit * <fctr> <list> 1 1 <S3: rlm> 2 2 <S3: rlm> 3 3 <S3: rlm> 4 4 <S3: rlm>
所以我为 DF.L 创建了一个非常相似的对象。它是一个包含四个组的框架,每个组在列表列中都有一个rlm
对象,名为 fit 。
现在我将编制一些数据来测试这个模型。在这种情况下,我只需要获取原始数据并为每个变量添加一些噪声。
ChickWeight_simulated <- ChickWeight %>%
mutate(Time = Time + runif(length(Time)),
weight = weight + rnorm(length(weight)))
ChickWeight_simulated
weight Time Chick Diet 1 42.72075 0.9786272 1 1 2 51.12669 2.8399631 1 1 3 58.64632 4.4576380 1 1 4 63.77617 6.1083591 1 1 5 75.40434 8.1051792 1 1 6 91.75830 10.7899030 1 1
现在我们想要将模型的数据框与要测试的新数据相结合。首先我们group_by
和tidyr::nest
模拟数据。这将创建一个对象,该对象是一个包含四个组的数据框和一个名为 data 的列表列,其中每个元素都包含一个汇总的数据框。
ChickWeight_simulated %>% group_by(Diet) %>% nest()
# A tibble: 4 x 2 Diet data <fctr> <list> 1 1 <tibble [220 x 3]> 2 2 <tibble [120 x 3]> 3 3 <tibble [120 x 3]> 4 4 <tibble [118 x 3]>
然后我们可以将它加入模型数据框:
ChickWeight_simulated %>% group_by(Diet) %>% nest() %>%
full_join(ChickWeight_models)
# A tibble: 4 x 3 Diet data fit <fctr> <list> <list> 1 1 <tibble [220 x 3]> <S3: rlm> 2 2 <tibble [120 x 3]> <S3: rlm> 3 3 <tibble [120 x 3]> <S3: rlm> 4 4 <tibble [118 x 3]> <S3: rlm>
现在我们再次按饮食进行分组,并使用broom::augment
对新模拟数据上的每个模型进行预测。由于每个组都是一行,因此适合和数据各有一个元素;我们必须使用[[1]]
将每个list-column中的单个元素提取为可用的形式。
ChickWeight_simulated_predicted <-
ChickWeight_simulated %>% group_by(Diet) %>% nest() %>%
full_join(ChickWeight_models) %>%
group_by(Diet) %>%
do(augment(.$fit[[1]], newdata = .$data[[1]]))
head(ChickWeight_simulated_predicted)
# A tibble: 6 x 6 # Groups: Diet [1] Diet weight Time Chick .fitted .se.fit <fctr> <dbl> <dbl> <ord> <dbl> <dbl> 1 1 42.72075 0.9786272 1 43.62963 2.368838 2 1 51.12669 2.8399631 1 51.80855 1.758385 3 1 58.64632 4.4576380 1 59.67606 1.534051 4 1 63.77617 6.1083591 1 68.43218 1.534152 5 1 75.40434 8.1051792 1 80.00678 1.647612 6 1 91.75830 10.7899030 1 97.26450 1.726331
为了证明这真的只使用来自饮食级别的模拟数据的饮食的特定模型,我们可以看到模型的适合度。< / p>
ChickWeight_simulated_predicted %>%
ggplot(aes(Time, weight)) +
geom_point(shape = 1) +
geom_ribbon(aes(Time,
ymin = .fitted-1.96*.se.fit,
ymax = .fitted+1.96*.se.fit),
alpha = 0.5, fill = "black") +
geom_line(aes(Time, .fitted), size = 1, color = "red") +
facet_wrap(~Diet)
答案 1 :(得分:2)
我认为你的错误来自你如何调用预测。我无法修复您的确切代码,但这是一种从模型中获取预测的简单方法。使用purrr
和nest
的更复杂方法概述于:http://ijlyttle.github.io/isugg_purrr/presentation.html#(1)
更新 - purrr
和nest
方式
只需添加此内容即可在tidyverse
内使用predict
轻松完成此操作。有关详细信息,请参阅上面的链接。
library(tidyverse)
# shuffle the rows to mix up the species
set.seed(1234)
myiris <- iris[sample(nrow(iris), replace = F),]
# create first dataset - use the first 50 rows for running the model
iris_nested <-
myiris[1:50,] %>%
nest(-Species) %>%
rename(myorigdata = data)
# create second dataset - use the other 100 rows for making predictions
new_iris_nested <-
myiris[51:150,] %>%
nest(-Species) %>%
rename(mynewdata = data)
# make a model function
my_rlm <- function(df) {
MASS::rlm(Sepal.Length ~ Petal.Length + Petal.Width, data = df)
}
# get the predictions (see the GitHub link above which breaks this into steps)
predictions_tall <-
iris_nested %>%
mutate(my_model = map(myorigdata, my_rlm)) %>%
full_join(new_iris_nested, by = "Species") %>%
mutate(my_new_pred = map2(my_model, mynewdata, predict)) %>%
select(Species, mynewdata, my_new_pred) %>%
unnest(mynewdata, my_new_pred) %>%
rename(modeled = my_new_pred, measured = Sepal.Length) %>%
gather("Type", "Sepal.Length", modeled, measured)
嵌套的predictions_tall
对象如下所示:
predictions_tall %>% nest(-Species, -type) %>% as.tibble()
# A tibble: 6 x 3
Species type data
<fctr> <chr> <list>
1 setosa modeled <data.frame [32 x 4]>
2 versicolor modeled <data.frame [33 x 4]>
3 virginica modeled <data.frame [35 x 4]>
4 setosa measured <data.frame [32 x 4]>
5 versicolor measured <data.frame [33 x 4]>
6 virginica measured <data.frame [35 x 4]>
最后,显示预测结果的图表:
predictions_tall %>%
ggplot(aes(x = Petal.Length, y = Sepal.Length)) +
geom_line(aes(color = Species, linetype = Type))
ORIGINAL - broom
方式
我现在更新了这个,只使用该组的模型计算每个组的预测。
这种方法使用broom
包 - 特别是augment
函数 - 来添加拟合值。点击此处:https://cran.r-project.org/web/packages/broom/vignettes/broom.html
由于您不提供数据,我在这里使用iris
。
library(tidyverse)
library(broom)
# first shuffle around the rows of iris
set.seed(1234)
myiris <- iris[sample(nrow(iris), replace = F),]
# first data - first 25 rows for running the models on
origiris <-
myiris[1:25,] %>%
nest(-Species) %>%
rename(origdata = data)
# second data - last 50 rows for predicting on
prediris <-
myiris[101:150,] %>%
nest(-Species) %>%
rename(preddata = data)
# estimate models on the first 25 rows
# a separate model is estimated for each species
iris_mod <-
origiris %>%
mutate(mod = map(origdata, ~ MASS::rlm(Sepal.Length ~ Petal.Length + Petal.Width, data = .)))
首先得到原始数据集的拟合值(不是必需的,仅用于说明):
# get fitted values for the first dataset (origdata)
origiris_aug <-
iris_mod %>%
mutate(origpred = map(mod, augment)) %>%
unnest(origpred) %>%
as.tibble()
origiris_aug
预测数据框如下所示:
origiris_aug
# A tibble: 25 x 10
Species .rownames Sepal.Length Petal.Length Petal.Width .fitted .se.fit .resid
<fctr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 18 5.1 1.4 0.3 5.002797 0.1514850 0.09720290
2 setosa 2 4.9 1.4 0.2 4.931824 0.1166911 -0.03182417
3 setosa 34 5.5 1.4 0.2 4.931824 0.1166911 0.56817583
4 setosa 40 5.1 1.5 0.2 4.981975 0.1095883 0.11802526
5 setosa 39 4.4 1.3 0.2 4.881674 0.1422123 -0.48167359
6 setosa 36 5.0 1.2 0.2 4.831523 0.1784156 0.16847698
7 setosa 25 4.8 1.9 0.2 5.182577 0.2357614 -0.38257703
8 setosa 31 4.8 1.6 0.2 5.032125 0.1241074 -0.23212531
9 setosa 42 4.5 1.3 0.3 4.952647 0.1760223 -0.45264653
10 setosa 21 5.4 1.7 0.2 5.082276 0.1542594 0.31772411
# ... with 15 more rows, and 2 more variables: .hat <dbl>, .sigma <dbl>
现在你真正想要的是 - 对新数据集进行预测:
# get fitted values for the second dataset (preddata)
# each model is fitted to the appropriate species' nested dataframe
prediris_aug <-
iris_mod %>%
inner_join(prediris, by = "Species") %>%
map2_df(.x = iris_mod$mod, .y = prediris$preddata, .f = ~augment(.x, newdata = .y)) %>%
as.tibble()
prediris_aug
数据框如下所示:
prediris_aug
# A tibble: 50 x 7
.rownames Sepal.Length Sepal.Width Petal.Length Petal.Width .fitted .se.fit
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 105 6.5 3.0 5.8 2.2 8.557908 3.570269
2 115 5.8 2.8 5.1 2.4 8.348800 3.666631
3 117 6.5 3.0 5.5 1.8 8.123565 3.005888
4 139 6.0 3.0 4.8 1.8 7.772511 2.812748
5 103 7.1 3.0 5.9 2.1 8.537086 3.475224
6 107 4.9 2.5 4.5 1.7 7.551086 2.611123
7 119 7.7 2.6 6.9 2.3 9.180537 4.000412
8 135 6.1 2.6 5.6 1.4 7.889823 2.611457
9 124 6.3 2.7 4.9 1.8 7.822661 2.838502
10 118 7.7 3.8 6.7 2.2 9.009263 3.825613
# ... with 40 more rows