所以我有这样的数据 -
## V2 V3 V4 V5 V6 V7 V8
## 2 27.0 41.3 2948.0 26.2 51.7 42.7 89.8
## 3 22.9 66.7 4644.0 3.0 45.7 41.8 121.3
## 4 26.3 58.1 3665.0 3.0 50.8 38.5 115.2
## 5 29.1 39.9 2878.0 18.3 51.5 38.8 100.3
## 6 28.1 62.6 4493.0 7.0 50.8 39.7 123.0
## 7 26.2 63.9 3855.0 3.0 50.7 31.1 124.8
我想做一个多元线性回归 -
model1 = lm(cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 + cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 + cigarette.data$V7, data = cigarette.data)
但是这给了我 -
##
## Call:
## lm(formula = cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 +
## cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 +
## cigarette.data$V7, data = cigarette.data)
##
## Residuals:
## ALL 51 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (186 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19 NA NA NA
## cigarette.data$V223.1 20 NA NA NA
## cigarette.data$V223.9 23 NA NA NA
## cigarette.data$V224.8 -16 NA NA NA
## cigarette.data$V225.0 21 NA NA NA
## cigarette.data$V225.1 25 NA NA NA
## cigarette.data$V225.9 -9 NA NA NA
## cigarette.data$V226.2 8 NA NA NA
这似乎不对。这是怎么回事?
答案 0 :(得分:3)
问题在于您拟合的模型具有比样本(即行)更多的预测变量。您的示例包含6个样本,因此5个变量(+ intercept = 6)将完美地预测V8
预测:
cigarette.data <- structure(list(V2 = c(27, 22.9, 26.3, 29.1, 28.1, 26.2), V3 = c(41.3,
66.7, 58.1, 39.9, 62.6, 63.9), V4 = c(2948, 4644, 3665, 2878,
4493, 3855), V5 = c(26.2, 3, 3, 18.3, 7, 3), V6 = c(51.7, 45.7,
50.8, 51.5, 50.8, 50.7), V7 = c(42.7, 41.8, 38.5, 38.8, 39.7,
31.1), V8 = c(89.784450178314, 121.359442280557, 115.031032135658,
100.201279353697, 123.401631728502, 124.750887806)), .Names = c("V2",
"V3", "V4", "V5", "V6", "V7", "V8"), row.names = c(NA, -6L), class = "data.frame")
fit <- lm(V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)
summary(fit)
Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)
Residuals:
ALL 6 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.89203 NA NA NA
V2 5.66196 NA NA NA
V3 2.16574 NA NA NA
V4 -0.01412 NA NA NA
V5 0.03093 NA NA NA
V6 -4.07376 NA NA NA
V7 NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 5 and 0 DF, p-value: NA
您的模型应包含更少的变量或更多样本(请参阅下面的示例):
fit <- lm(V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)
summary(fit)
Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)
Residuals:
1 2 3 4 5 6
-1.1873 0.9570 -2.9738 1.9870 -0.7142 1.9312
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.846025 57.297709 0.311 0.808
V2 1.848628 1.240164 1.491 0.376
V3 0.802375 0.879204 0.913 0.529
V4 0.001821 0.008315 0.219 0.863
V5 -0.583697 0.601185 -0.971 0.509
Residual standard error: 4.4 on 1 degrees of freedom
Multiple R-squared: 0.981, Adjusted R-squared: 0.9052
F-statistic: 12.94 on 4 and 1 DF, p-value: 0.2052
答案 1 :(得分:0)
数据框中的记录之一必须为null值或0.0。在拟合模型之前,请尝试估算这些记录或将其从数据框中删除。