Question

我正在尝试编写一个函数，该函数使用包roll_lm中的roll函数跨数据帧中的多个列运行滚动回归。然后，我想将此功能应用于格式相同但x值不同的多个数据帧。我正在使用x = RunTime运行回归，其余的列分别是y值。每个数据帧的运行时值都不同。到目前为止，以下是我要在数据框Sum_9.18上运行所需功能的情况，其中我有许多其他相似的数据框。

fun1 <- function(y) {
roll_lm(Sum_9.18$RunTime, y, width = 100)
}
test1 <- data.frame(lapply(Sum_9.18[2:11], fun1))

Answer 1

我将提供一个使用lm()模型的解决方案，因为我没有安装roll软件包。

首先，我将假定您能够将数据帧存储在列表中。在这种情况下，您可以使用lapply()对要在其上应用fun1()函数的每个数据帧进行迭代。

因此，从概念上讲，我们将有两个由lapply()调用的函数，在外部级别上一个，它们会在数据帧 < / em>和一个在内部级别（）进行迭代的数据，该数据在被分析数据框的列 中进行迭代。内部函数将是您的fun1()函数，但为了更加清晰起见，我将其称为fun_inner()。

因此，假设您有一个称为list_of_dfs的数据帧列表，那么以下函数将允许您执行所需的操作（您只需要改编fun_inner()函数以调用{ {1}}函数）：

roll_lm()

这是一个用例示例：

#' Fits a predictive model using each column of a data frame as target variable #' #' @param df data frame with the data to fit. #' @param target_columns name or index of the column(s) in data frame \code{df} #' to be used as target variables, one at a time. #' @param pred_columns name or index of the column(s) in data frame \code{df} #' to be used as predictor variables. #' @param fun_model function to be called that fits the model \code{y ~ X} #' where \code{y} is each variable taken from the \code{target_columns} of the input data frame #' and \code{X} is the set of predictor variables taken from the \code{pred_columns} #' of the input data frame. #' #' @return a list containing the result of the model fit to each target variable. fun_outer_iterate_on_dfs <- function(df, target_columns, pred_columns, fun_model) { lapply(df[, target_columns, drop=FALSE], fun_model, as.matrix(df[, pred_columns, drop=FALSE])) } #' Fits an lm( y ~ X ) model. \code{y} is a numeric vector and \code{X} is a matrix. fun_inner_fit_model <- function(y, X) { lm( y ~ X ) }

请注意，如何通过在函数名称（在本例中为set.seed(1717) nobs = 10 # List containing the data frames to be used in the model fits list_of_dfs[[1]] = data.frame(RunTime=rnorm(nobs), y1=rnorm(nobs), y2=rnorm(nobs)) list_of_dfs[[2]] = data.frame(RunTime=rnorm(nobs), y1=rnorm(nobs), y2=rnorm(nobs)) list_of_dfs[[3]] = data.frame(RunTime=rnorm(nobs), y1=rnorm(nobs), y2=rnorm(nobs)) test_models_on_each_df <- lapply(list_of_dfs, fun_outer_iterate_on_dfs, c("y1", "y2"), "RunTime", fun_inner_fit_model)之后）中简单列出它们，来将更多参数（第一个参数除外）传递给lapply()调用的函数。

传递给fun_outer_iterate_on_dfs函数的列名称也可以是列索引。

上面的代码给出了类似的内容：

fun_outer_iterate_on_dfs()

我们在三个数据框中分别找到了两个回归，一个针对目标变量[[1]] [[1]]$y1 Call: lm(formula = y ~ X) Coefficients: (Intercept) X -0.05994 -0.11727 [[1]]$y2 Call: lm(formula = y ~ X) Coefficients: (Intercept) X 0.02854 -0.08574 [[2]] [[2]]$y1 Call: lm(formula = y ~ X) Coefficients: (Intercept) X -0.23479 -0.01973 [[2]]$y2 Call: lm(formula = y ~ X) Coefficients: (Intercept) X 0.07248 -0.33088 [[3]] [[3]]$y1 Call: lm(formula = y ~ X) Coefficients: (Intercept) X -0.3087 -0.1191 [[3]]$y2 Call: lm(formula = y ~ X) Coefficients: (Intercept) X 0.1765 0.5085，另一个针对目标变量y1。

最后，如果您已经将数据帧存储为不同的对象，则可以使用以下代码段将数据帧存储在列表中，假设所有数据帧名称都遵循该模式y2，并且在工作空间中定义了该模式的所有对象都是感兴趣的数据帧：

Sum_*

如何将功能应用于多个数据框

1 个答案: