我正在尝试对数据集使用向后逐步回归,以查看哪些变量对我的变量影响最大。我已经搜索了数据,没有空单元格或NA。在R方面,我仍然是一个菜鸟级别,因此,如果我对此感到愚蠢,请对待我。
studentreport<-read.csv(mydata.csv,header=T,sep=",")
studentreport <-data.frame(studentreport) Fitallreport <-glm(正在注册〜。,data = studentreport,family = binomial) 步骤(Fitallreport)
当我运行时,它就像这样
studentreport<-read.csv(mydata.csv,header=T,sep=",")
studentreport <-data.frame(studentreport) Fitallreport <-glm(正在注册〜。,data = studentreport,family = binomial) 警告信息: glm.fit:算法未收敛 步骤(Fitallreport) 开始:AIC = 394 入学〜学校+城市+州+生日+性别+种族+ TShirt +专业+ ACT + SAT +排名+ CSize + GPA + GPAType
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race +
TShirt + Major + ACT + SAT + Rank + CSize + GPA
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race +
TShirt + Major + ACT + SAT + Rank + CSize
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race +
TShirt + Major + ACT + SAT + Rank
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race +
TShirt + Major + ACT + SAT
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race +
TShirt + Major + ACT
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race +
TShirt + Major
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race +
TShirt
Step: AIC=394
Enrolling ~ School + City + State + Birthdate + Gender + Race
Error in step(Fitallreport) :
number of rows in use has changed: remove missing values?
In addition: There were 50 or more warnings (use warnings() to see the first
50)
>
当我尝试提出警告时,它变成空白。另外,如果我想回答这个问题,那么我该怎么做?我本来打算在lm上使用lm,但是注册只是对与错,所以我认为glm更好,这是一个错误的假设吗?
希望我能提供足够的信息。
> str(studentreport)
'data.frame': 200 obs. of 15 variables:
$ Enrolling: logi FALSE TRUE FALSE FALSE TRUE TRUE ...
$ School : Factor w/ 176 levels "Academy for the Arts, Science and Technology",..: 155 25 110 141 79 46 89 19 83 83 ...
$ City : Factor w/ 154 levels "Albany","Ames",..: 64 82 56 132 135 81 122 2 77 77 ...
$ State : Factor w/ 27 levels "AL","CA","CT",..: 6 25 25 18 3 3 2 8 18 18 ...
$ Birthdate: Factor w/ 157 levels "1/11/00","1/12/00",..: 145 107 71 153 133 85 144 127 104 104 ...
$ Gender : Factor w/ 3 levels "Female","Male",..: 2 1 1 2 1 1 1 1 1 1 ...
$ Race : Factor w/ 6 levels "A","B","D","E",..: 6 6 6 1 1 3 6 6 1 6 ...
$ TShirt : Factor w/ 5 levels "2X","L","M","S",..: 2 4 4 2 3 1 2 3 4 4 ...
$ Major : Factor w/ 35 levels "Accounting","Art History",..: 19 16 7 6 23 28 23 7 34 34 ...
$ ACT : int 28 22 25 25 31 24 29 25 24 24 ...
$ SAT : num 1950 1390 1625 1540 1625 ...
$ Rank : int 60 60 60 60 60 60 60 60 60 60 ...
$ CSize : int 337 337 337 337 337 337 337 337 337 337 ...
$ GPA : num 4.46 3.8 4.17 3.22 3.8 ...
$ GPAType : Factor w/ 2 levels "Unweighted","Weighted": 2 2 2 1 2 2 1 1 2 2
...
> summary(studentreport)
Enrolling School City
Mode :logical Lexington High School : 4 Columbia : 6
FALSE:160 Socastee High School : 3 Lexington : 6
TRUE :40 Beaufort High School : 2 Myrtle Beach: 5
Clover High School : 2 Charleston : 4
Greenwood High School : 2 Rock Hill : 4
Hilton Head Island High School: 2 Conway : 3
(Other) :185 (Other) :172
State Birthdate Gender Race TShirt
SC :78 2/25/00: 4 Female:135 A:60 2X:29
MD :15 1/17/00: 3 Male : 60 B: 2 L :38
NJ :13 3/30/00: 3 Other : 2 D:25 M :66
NY :13 4/13/00: 3 NA's : 3 E: 3 S :61
MA :11 5/19/00: 3 F:18 XL: 6
FL : 7 6/1/00 : 3 G:92
(Other):63 (Other):181
Major ACT SAT Rank
Business Administration:28 Min. :17.00 Min. : 910 Min. : 2.0
Biology :27 1st Qu.:24.00 1st Qu.:1540 1st Qu.: 60.0
Undecided :25 Median :25.00 Median :1625 Median : 60.0
Communication :15 Mean :25.02 Mean :1619 Mean : 73.8
Psychology :11 3rd Qu.:25.00 3rd Qu.:1655 3rd Qu.: 62.0
Marketing : 9 Max. :35.00 Max. :2090 Max. :426.0
(Other) :85
CSize GPA GPAType
Min. : 22.0 Min. :2.703 Unweighted: 13
1st Qu.:331.5 1st Qu.:3.428 Weighted :187
Median :337.0 Median :3.822
Mean :337.8 Mean :3.864
3rd Qu.:337.0 3rd Qu.:4.281
Max. :990.0 Max. :5.590