我有一个包含以下各列的数据集
Acres,FamilyType, NumBedrooms,NumChildren, NumPeople, NumRooms,NumUnits NumVehicles,NumWorkers, OwnRent,YearBuilt, HouseCosts,ElectricBill, FoodStamp,HeatingFuel,Insurance,Language, above_150K
我做了
fit<-glm(above_150K~Acres+ FamilyType+ NumBedrooms+ NumChildren+NumPeople+NumRooms+NumUnits+NumVehicles+NumWorkers+OwnRent+YearBuilt+HouseCosts+ElectricBill+FoodStamp+HeatingFuel+Insurance+Language,data=‘df’)
summary(fit)
它将每一列进一步细分为如下所示的子列
Abbreviation
Acres10+ A
AcresSub 1 A1
FamilyTypeMale Head FH
FamilyTypeMarried FT
NumBedrooms NB
NumChildren NC
NumPeople NP
NumRooms NR
NumUnitsSingle attached Na
NumUnitsSingle detached Nd
NumVehicles NV
NumWorkers NW
OwnRentOutright ORO
OwnRentRented ORR
YearBuilt1940-1949 YB194
YearBuilt1950-1959 YB195
YearBuilt1960-1969 YB196
YearBuilt1970-1979 YB197
YearBuilt1980-1989 YB198
YearBuilt1990-1999 YB199
YearBuilt2000-2004 YB2000
YearBuilt2005 YB2005
YearBuilt2006 YB2006
YearBuilt2007 YB2007
YearBuilt2008 YB2008
YearBuilt2009 YB2009
YearBuilt2010 YB201
YearBuiltBefore 1939 Y1
HouseCosts HC
ElectricBill E
FoodStampYes FS
HeatingFuelElectricity HFE
HeatingFuelGas HFG
HeatingFuelNone HFN
HeatingFuelOil HtngFlOl
HeatingFuelOther HtngFlOt
HeatingFuelSolar HFS
HeatingFuelWood HFW
Insurance I
LanguageEnglish LnE
LanguageOther LO
LanguageOther European LOE
LanguageSpanish LS
如您所见,单列的HeatFuel分解为
HeatingFuelElectricity HFE
HeatingFuelGas HFG
HeatingFuelNone HFN
HeatingFuelOil HtngFlOl
HeatingFuelOther HtngFlOt
HeatingFuelSolar HFS
HeatingFuelWood HFW
为什么会这样?
我想选择150_150K以上的预测变量,我使用了逐步,AllSubsets自动变量选择,他们建议 要使用所有变量。请问有人可以澄清吗?