我有包含连续变量和分类变量的数据,响应变量为1或0:
>
str(test3)
'data.frame': 690 obs. of 7 variables:
$ A1 : Factor w/ 3 levels "?","a","b": 3 2 2 3 3 3 3 2 3 3 ...
$ A2 : num 30.8 58.7 24.5 27.8 20.2 ...
$ A3 : num 0 4.46 0.5 1.54 5.62 ...
$ A4 : Factor w/ 4 levels "?","l","u","y": 3 3 3 3 3 3 3 3 4 4 ...
$ A8 : num 1.25 3.04 1.5 3.75 1.71 ...
$ A11: int 1 6 0 5 0 0 0 0 0 0 ...
$ A16: num 1 1 1 1 1 1 1 1 1 1 ...*
绘制模型的方式是什么?我应该划分类别变量和连续变量吗? 我已经尝试过了:
mod3 <- glm(A16~., data=credit, family=binomial)
mod3$coefficients
summary(mod3)
但是我收到错误:
glm.fit: fitted probabilities numerically 0 or 1 occurred
head(test3, n=30)
A1 A2 A3 A4 A8 A11 A16
1 b 30.83 0.000 u 1.250 1 1
2 a 58.67 4.460 u 3.040 6 1
3 a 24.50 0.500 u 1.500 0 1
4 b 27.83 1.540 u 3.750 5 1
5 b 20.17 5.625 u 1.710 0 1
6 b 32.08 4.000 u 2.500 0 1
7 b 33.17 1.040 u 6.500 0 1
8 a 22.92 11.585 u 0.040 0 1
9 b 54.42 0.500 y 3.960 0 1
10 b 42.50 4.915 y 3.165 0 1
11 b 22.08 0.830 u 2.165 0 1
12 b 29.92 1.835 u 4.335 0 1
13 a 38.25 6.000 u 1.000 0 1
14 b 48.08 6.040 u 0.040 0 1
15 a 45.83 10.500 u 5.000 7 1
16 b 36.67 4.415 y 0.250 10 1
17 b 28.25 0.875 u 0.960 3 1
18 a 23.25 5.875 u 3.170 10 1
19 b 21.83 0.250 u 0.665 0 1
20 a 19.17 8.585 u 0.750 7 1
21 b 25.00 11.250 u 2.500 17 1
22 b 23.25 1.000 u 0.835 0 1
23 a 47.75 8.000 u 7.875 6 1
24 a 27.42 14.500 u 3.085 1 1
25 a 41.17 6.500 u 0.500 3 1
26 a 15.83 0.585 u 1.500 2 1
27 a 47.00 13.000 u 5.165 9 1
28 b 56.58 18.500 u 15.000 17 1
29 b 57.42 8.500 u 7.000 3 1
30 b 42.08 1.040 u 5.000 6 1
答案 0 :(得分:1)
很遗憾您无法查看完整的数据集。我怀疑问号是因素,但其他奇怪因素似乎都不重要。我模拟了一个类似的数据集。使用或不使用na.omit均可正常运行。
简短的回答是,您不必做任何特别的事情来告诉它变量类型...
set.seed(2020)
A1 <- factor(sample(letters[1:3], size = 100,replace = TRUE))
A2 <- runif(100, min = 20, max = 70)
A3 <- runif(100, min = 0, max = 10)
A4 <- factor(sample(c("l", "u", "y", "x"), size = 100,replace = TRUE))
A8 <- runif(100, min = 0, max = 20)
A11 <- sample(0:20, size = 100, replace = TRUE)
A16 <- as.numeric(sample(0:1, size = 100, replace = TRUE, prob = c(.1, .9)))
credit <- data.frame(A1, A2, A3, A4, A8, A11, A16)
str(credit)
#> 'data.frame': 100 obs. of 7 variables:
#> $ A1 : Factor w/ 3 levels "a","b","c": 3 2 1 1 2 2 1 1 2 2 ...
#> $ A2 : num 38.8 54.1 29.1 23.3 32 ...
#> $ A3 : num 0.118 2.288 0.986 3.363 5.745 ...
#> $ A4 : Factor w/ 4 levels "l","u","x","y": 4 2 2 2 2 3 2 2 2 3 ...
#> $ A8 : num 8.85 17.94 4.42 2.88 14.77 ...
#> $ A11: int 4 2 13 2 20 18 20 20 9 18 ...
#> $ A16: num 1 1 1 1 1 1 0 1 1 1 ...
mod3 <- glm(A16~., data=credit, family=binomial, na.action = na.omit)
mod3
#>
#> Call: glm(formula = A16 ~ ., family = binomial, data = credit, na.action = na.omit)
#>
#> Coefficients:
#> (Intercept) A1b A1c A2 A3 A4u
#> 0.37850 -0.49031 -0.52429 0.02990 0.07271 1.08706
#> A4x A4y A8 A11
#> 1.05172 0.38511 -0.00192 -0.02511
#>
#> Degrees of Freedom: 99 Total (i.e. Null); 90 Residual
#> Null Deviance: 69.3
#> Residual Deviance: 65.55 AIC: 85.55