为什么预测变量不匹配变量名称sin模型?

时间:2016-10-11 10:03:38

标签: r regression

train <- read.csv("train.csv", header=T)

> str(train)
'data.frame':   188318 obs. of  132 variables:
 $ id    : int  1 2 5 10 11 13 14 20 23 24 ...
 $ cat1  : Factor w/ 2 levels "A","B": 1 1 1 2 1 1 1 1 1 1 ...
 $ cat2  : Factor w/ 2 levels "A","B": 2 2 2 2 2 2 1 2 2 2 ...
 $ cat3  : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 2 1 ...
 $ cat4  : Factor w/ 2 levels "A","B": 2 1 1 2 2 1 1 2 2 1 ...

predictors <- names(subset(train, select = -c(id,loss)))
fit <- lm(paste("loss ~ ",paste(predictors, collapse="+"),sep=""), data=train)

当我看到合适的概要时,有966个系数?我预计那里只有&#34;预测者的长度和#34;减去2(身份和损失)。

> variable.names(fit)
  [1] "(Intercept)" "cat1B"       "cat2B"       "cat3B"       "cat4B"       "cat5B"       "cat6B"      
  [8] "cat7B"       "cat8B"       "cat9B"       "cat10B"      "cat11B"      "cat12B"      "cat13B"     
 [15] "cat14B"      "cat15B"      "cat16B"      "cat17B"      "cat18B"      "cat19B"      "cat20B"     
 [22] "cat21B"      "cat22B"      "cat23B"      "cat24B"      "cat25B"      "cat26B"      "cat27B"     
 [29] "cat28B"      "cat29B"      "cat30B"      "cat31B"      "cat32B"      "cat33B"      "cat34B"    

然后再向下(不想全部粘贴)

"cat110AK"    "cat110AL"    "cat110AM"    "cat110AN"    "cat110AO"    "cat110AP"    "cat110AR"   
[393] "cat110AS"    "cat110AT"    "cat110AU"    "cat110AV"    "cat110AW"    "cat110AX"    "cat110AY"   
[400] "cat110B"     "cat110BA"    "cat110BB"    "cat110BC"    "cat110BD"    "cat110BE"    "cat110BF"   
[407] "cat110BG"    "cat110BI"    "cat110BJ"    "cat110BK"    "cat110BL"    "cat110BM"    "cat110BN"  

什么是&#34; B&#34;和&#34; AT&#34;哪个被附加到变量?为什么有超过130个?

0 个答案:

没有答案