在RHS上运行带有许多项的线性回归的致命错误

时间:2019-03-16 17:40:05

标签: r lm

运行@WebService(serviceName = "NewClass", targetNamespace = "http://my.org/ns/") public class NewClass { @WebMethod(operationName = "hello") public String hello() { return new File("test.txt").getAbsolutePath(); // /Users/pan/glassfish5/glassfish/domains/domain1/config/test.txt" } } 时,RStudio给我一个致命错误并重新启动会话。我的源数据有几个因子列,每个因子列分解为数百个虚拟变量。下面的lm(提取线性相关的虚拟变量)有363个条目。因此,我的总体回归方程包含RHS上的数百个术语。所有列,虚拟变量,然后减去线性相关的虚拟变量:

y =(所有x变量,包括自动生成的每个虚拟变量)-(363个线性相关的虚拟变量)

RHS的长度是我致命错误的根源吗?

下面是我的尝试,使用了this SO solution中的代码。

代码

ld.vars
library(car)

load(‘full_data.rda’)

## build original regression with all dummy variables included
formula <- as.formula(paste0("Closing.Cost ~ ", paste(colnames(full_data[-19]), collapse=' + ')))
reg3 <- lm(formula, full_data)

## this line produces a warning: "prediction from a rank-deficient fit may be misleading"
predict(reg3, newdata=full_data[106,], interval="prediction")

## this line produces an error: "Error in vif.default(reg3) : there are aliased coefficients in the model"
vif(reg3)

## find the linearly dependent variables
ld.vars <- attributes(alias(reg3)$Complete)$dimnames[[1]]
ld.vars <- paste0("`", ld.vars, "`")

## remove the linearly dependent variables
formula.new <- as.formula(paste0(formula, " - ", paste0(ld.vars, collapse = " - "), collapse = " - "))

## run new model: this line produces a fatal error
reg4 <-lm(formula.new, full_data)

## assess collinearity of new regression (haven't been able to run this line)
vif(reg4)

我有数百个级别的几个因素

> matrix <- model.matrix(formula, full_data)
> dim(matrix)
[1] 4179 1311

在@BenBolker的注释中,我尝试了在RHS上使用大量术语的广义> total_levels <- full_data %>% purrr::map(levels) %>% map(length) > Reduce("+", total_levels) [1] 1271 (2582列,足以等于我在上面的示例中尝试的术语数量)。效果很好:

lm

致命错误 pop-up

会话信息(尽管这是在运行引发致命错误的代码块之前):

dd <- as.data.frame(matrix(rnorm(4179*2582),ncol=2582)); m1 <- lm(V1 ~ ., data=dd)

0 个答案:

没有答案