Question

我希望在下面的数据框架上运行线性回归。

test<-data.frame(abc=c(2.4,3.2,8.9,9.8,10.0,3.2,5.4),
             city1_0=c(5.3,2.6,3,5.4,7.8,4.4,5.5),
             city1_1=c(2.3,5.6,3,2.4,3.6,2.4,6.5),
             city1_2=c(4.2,1.4,2.6,2,6,3.6,2.4),
             city1_3=c(2.4,2.6,9.4,4.6,2.5,1.2,7.5),
             city1_4=c(8.2,4.2,7.6,3.4,1.7,5.2,9.7),
             city2_0=c(4.3,8.6,6,3.7,7.8,4.7,5.8),                                           city2_1=c(5.3,2.6,3,5.4,7.8,4.4,5.5))

Dataframe＆＃34; test＆＃34;是数据的样本。但原始数据框包含100列。我想创建一个使用线性回归预测值的脚本。在这种情况下，我想构建具有不同输入变量的许多模型。

例如，在给定的数据框中， abc 是y变量。我想建立一个模型 city1_1，city1_2，city1_3，city1_4（离开city1_0，city2_0）。然后是city1_2，city1_3，city1_4的其他模型（离开city1_0，city1_1，city2_0，city2_1）然后是输入变量city1_3，city1_4（离开city1_0，city1_1，city1_2，city2_0，city2_1）的第3个模型，等等。

这些所有变量都输入到线性回归。

我必须为40个数据帧做这件事。每个数据帧的O / P变量名称保持不变。

Answer 1

您可以使用正则表达式创建公式列表，然后在此列表中创建# create data test<-data.frame(abc=c(2.4,3.2,8.9,9.8,10.0,3.2,5.4), city1_0=c(5.3,2.6,3,5.4,7.8,4.4,5.5), city1_1=c(2.3,5.6,3,2.4,3.6,2.4,6.5), city1_2=c(4.2,1.4,2.6,2,6,3.6,2.4), city1_3=c(2.4,2.6,9.4,4.6,2.5,1.2,7.5), city1_4=c(8.2,4.2,7.6,3.4,1.7,5.2,9.7), city2_0=c(4.3,8.6,6,3.7,7.8,4.7,5.8), city2_1=c(5.3,2.6,3,5.4,7.8,4.4,5.5)) # create list of formulas myformulas <- list(as.formula(paste("abc", paste(grep("city1_[123456789]", names(test), value = TRUE), collapse = " + "), sep = " ~ ")), as.formula(paste("abc", paste(grep("city1_[23456789]", names(test), value = TRUE), collapse = " + "), sep = " ~ ")), as.formula(paste("abc", paste(grep("city1_[3456789]", names(test), value = TRUE), collapse = " + "), sep = " ~ "))) # check formulas > myformulas [[1]] abc ~ city1_1 + city1_2 + city1_3 + city1_4 [[2]] abc ~ city1_2 + city1_3 + city1_4 [[3]] abc ~ city1_3 + city1_4 # loop over formulas mylms <- lapply(myformulas, function(x) lm(x, data = test)) # get output of linear regressions > mylms [[1]] Call: lm(formula = x, data = test) Coefficients: (Intercept) city1_1 city1_2 city1_3 city1_4 5.8987 -0.2480 0.6316 1.1810 -1.0420 [[2]] Call: lm(formula = x, data = test) Coefficients: (Intercept) city1_2 city1_3 city1_4 4.8903 0.7114 1.1673 -1.0595 [[3]] Call: lm(formula = x, data = test) Coefficients: (Intercept) city1_3 city1_4 7.909 1.047 -1.102：

grep()

您甚至可以预先指定mygreps <- c("city1_[123456789]", "city1_[23456789]", "city1_[3456789]") myformulas <- lapply(mygreps, function(x) as.formula(paste("abc", paste(grep(x, names(test), value = TRUE), collapse = " + "), sep = " ~ ")))模式并使用循环创建公式：

city

编辑：

您也可以定义paste()变量的值范围，并使用myranges <- lapply(1:16, function(x) x:16) myvars <- paste0("city", 1:10, "_")生成字符串。

示例：

lapply()

然后，通过嵌套myformulas <- lapply(myvars, function(x) lapply(myranges, function(y) as.formula(paste("abc", paste(x, y, sep = "", collapse = " + "), sep = " ~ "))))调用创建公式：

myformulas

city1_现在将包含10个列表（每个city10_到cityX_16一个），每个列表中包含16个公式（每个列表包含递减量的变量，从所有16开始，以及仅以myformulas结尾。

现在只需循环# loop over formulas mylms <- lapply(myformulas, function(x) lapply(x, function(y) lm(y, data = test)))即可获得线性回归输出列表：

Code     Length     Width     Height

A         78         48        25     
B         78         48        34 
C         12         7.4        5
D         12         15         5
E         12         15       7.5
F         12         15         9
G         24         15         5
H         24         15         7

在R

1 个答案: