train <- read.csv("train.csv", header=T)
> str(train)
'data.frame': 188318 obs. of 132 variables:
$ id : int 1 2 5 10 11 13 14 20 23 24 ...
$ cat1 : Factor w/ 2 levels "A","B": 1 1 1 2 1 1 1 1 1 1 ...
$ cat2 : Factor w/ 2 levels "A","B": 2 2 2 2 2 2 1 2 2 2 ...
$ cat3 : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 2 1 ...
$ cat4 : Factor w/ 2 levels "A","B": 2 1 1 2 2 1 1 2 2 1 ...
predictors <- names(subset(train, select = -c(id,loss)))
fit <- lm(paste("loss ~ ",paste(predictors, collapse="+"),sep=""), data=train)
当我看到合适的概要时,有966个系数?我预计那里只有&#34;预测者的长度和#34;减去2(身份和损失)。
> variable.names(fit)
[1] "(Intercept)" "cat1B" "cat2B" "cat3B" "cat4B" "cat5B" "cat6B"
[8] "cat7B" "cat8B" "cat9B" "cat10B" "cat11B" "cat12B" "cat13B"
[15] "cat14B" "cat15B" "cat16B" "cat17B" "cat18B" "cat19B" "cat20B"
[22] "cat21B" "cat22B" "cat23B" "cat24B" "cat25B" "cat26B" "cat27B"
[29] "cat28B" "cat29B" "cat30B" "cat31B" "cat32B" "cat33B" "cat34B"
然后再向下(不想全部粘贴)
"cat110AK" "cat110AL" "cat110AM" "cat110AN" "cat110AO" "cat110AP" "cat110AR"
[393] "cat110AS" "cat110AT" "cat110AU" "cat110AV" "cat110AW" "cat110AX" "cat110AY"
[400] "cat110B" "cat110BA" "cat110BB" "cat110BC" "cat110BD" "cat110BE" "cat110BF"
[407] "cat110BG" "cat110BI" "cat110BJ" "cat110BK" "cat110BL" "cat110BM" "cat110BN"
什么是&#34; B&#34;和&#34; AT&#34;哪个被附加到变量?为什么有超过130个?