我有一个线性model1<-lm(divorce_rate~marriage_rate+median_age+population)
,其杠杆图显示28的异常值(&#34的状态变量id;内华达&#34;)。我想在数据集中指定没有Nevada的模型。我试过以下但是卡住了。
data<-read.dta("census.dta")
attach(data)
data1<-data.frame(pop,divorce,marriage,popurban,medage,divrate,marrate)
attach(data1)
model1<-lm(divrate~marrate+medage+pop,data=data1)
summary(model1)
layout(matrix(1:4,2,2))
plot(model1)
dfbetaPlots(lm(divrate~marrate+medage+pop),id.n=50)
vif(model1)
dataNV<-data[!data$state == "Nevada",]
attach(dataNV)
model3<-lm(divrate~marrate+medage+pop,data=dataNV)
上面代码的最后一行给了我
Error in model.frame.default(formula = divrate ~ marrate + medage + pop, :
variable lengths differ (found for 'medage')
答案 0 :(得分:1)
我怀疑你的代码中有一些小问题,以至于你的环境中仍然存在附着()ed副本 - 这就是为什么使用{{ 非常好的做法1}}。以下代码适用于我:
attach()
我在数据集中没有找到library(foreign)
## best not to call data 'data'
mydata <- read.dta("http://www.stata-press.com/data/r8/census.dta")
或divrate
:我将推测您想要人均费率:
marrate
在干净的会话中,这对我来说很好:
## best practice to use a new name rather than transforming 'in place'
mydata2 <- transform(mydata,marrate=marriage/pop,divrate=divorce/pop)
model1 <- lm(divrate~marrate+medage+pop,data=mydata2)
library(car)
plot(model1)
dfbetaPlots(model1)
或者您可以使用dataNV <- subset(mydata2,state != "Nevada")
## update() may be nice to avoid repeating details of the
## model specification (not really necessary in this case)
model3 <- update(model1,data=dataNV)
参数:
subset