所以我使用 rpart 创建了一个回归树,输出到 reg_tree
# show summary statistics of reg_tree
summary(reg_tree)
# store top variables as new values
topvars <- reg_tree$variable.importance
# output of topvars
topvars
q_21fb1900 q_2b3296a0 q_0 q_fde6a01e q_7fa850ed q_323d6cee q_c6ab3657 q_eb2ad90d q_5dcb2b57
5.303283e+15 5.196871e+15 4.002239e+15 4.412505e+14 2.616730e+14 2.162128e+14 2.035465e+14 1.354927e+14 5.095959e+13
q_af2830be q_caa61b2c q_a6828865 q_99f5a0bd q_be83fe28 q_efdc29dd q_9e86aa7f q_2ea0e2aa q_5049294d
2.176437e+13 1.210118e+13 1.126591e+13 8.387189e+12 4.951978e+12 4.115929e+12 3.864235e+12 1.449853e+12 5.436949e+11
q_5ae0f0cd q_518fba14
5.436949e+11 5.412242e+11
我想将这些名称中的每一个提取为xvar1,xvar2并自动将它们放在以下模型中,其中每个xvar对应于列标题:lm(y_var ~ xvar1 + xvar2 + xvar3 + ... +,data)
。
即
lm(y_var ~ q_21fb1900 + q_2b3296a0 + q_0 + ... +,data)
我如何做到这一点,以便我可以放入新的数据集,而不用担心将来自定义更改每个xvar?
答案 0 :(得分:1)
试试这个:
示例:
reg_tree <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
topvars <- reg_tree$variable.importance
myreg<-lm(as.formula(paste("as.numeric(Kyphosis) ~ ",paste(names(topvars), collapse = " + "), sep = "")),data=kyphosis)
> summary(myreg)
Call:
lm(formula = as.formula(paste("as.numeric(Kyphosis) ~ ", paste(names(topvars),
collapse = " + "), sep = "")), data = kyphosis)
Residuals:
Min 1Q Median 3Q Max
-0.79440 -0.22356 -0.08478 0.10205 0.84768
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.2612198 0.1934124 6.521 6.61e-09 ***
Start -0.0307392 0.0091166 -3.372 0.00117 **
Age 0.0010657 0.0006937 1.536 0.12858
Number 0.0525555 0.0274522 1.914 0.05928 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3599 on 77 degrees of freedom
Multiple R-squared: 0.2575, Adjusted R-squared: 0.2285
F-statistic: 8.9 on 3 and 77 DF, p-value: 3.912e-05