自动变量选择

时间:2018-11-14 21:49:37

标签: r logistic-regression forecasting feature-selection variable-selection

我有一个包含以下各列的数据集

Acres,FamilyType, NumBedrooms,NumChildren, NumPeople, NumRooms,NumUnits NumVehicles,NumWorkers, OwnRent,YearBuilt, HouseCosts,ElectricBill, FoodStamp,HeatingFuel,Insurance,Language, above_150K

我做了

fit<-glm(above_150K~Acres+ FamilyType+ NumBedrooms+ NumChildren+NumPeople+NumRooms+NumUnits+NumVehicles+NumWorkers+OwnRent+YearBuilt+HouseCosts+ElectricBill+FoodStamp+HeatingFuel+Insurance+Language,data=‘df’) 
summary(fit)

它将每一列进一步细分为如下所示的子列

                        Abbreviation
Acres10+                           A
AcresSub 1                        A1
FamilyTypeMale Head               FH
FamilyTypeMarried                 FT
NumBedrooms                       NB
NumChildren                       NC
NumPeople                         NP
NumRooms                          NR
NumUnitsSingle attached           Na
NumUnitsSingle detached           Nd
NumVehicles                       NV
NumWorkers                        NW
OwnRentOutright                  ORO
OwnRentRented                    ORR
YearBuilt1940-1949             YB194
YearBuilt1950-1959             YB195
YearBuilt1960-1969             YB196
YearBuilt1970-1979             YB197
YearBuilt1980-1989             YB198
YearBuilt1990-1999             YB199
YearBuilt2000-2004            YB2000
YearBuilt2005                 YB2005
YearBuilt2006                 YB2006
YearBuilt2007                 YB2007
YearBuilt2008                 YB2008
YearBuilt2009                 YB2009
YearBuilt2010                  YB201
YearBuiltBefore 1939              Y1
HouseCosts                        HC
ElectricBill                       E
FoodStampYes                      FS
HeatingFuelElectricity           HFE
HeatingFuelGas                   HFG
HeatingFuelNone                  HFN
HeatingFuelOil              HtngFlOl
HeatingFuelOther            HtngFlOt
HeatingFuelSolar                 HFS
HeatingFuelWood                  HFW
Insurance                          I
LanguageEnglish                  LnE
LanguageOther                     LO
LanguageOther European           LOE
LanguageSpanish                   LS

如您所见,单列的HeatFuel分解为

HeatingFuelElectricity           HFE
HeatingFuelGas                   HFG
HeatingFuelNone                  HFN
HeatingFuelOil              HtngFlOl
HeatingFuelOther            HtngFlOt
HeatingFuelSolar                 HFS
HeatingFuelWood                  HFW

为什么会这样?

我想选择150_150K以上的预测变量,我使用了逐步,AllSubsets自动变量选择,他们建议 要使用所有变量。请问有人可以澄清吗?

0 个答案:

没有答案