我有来自Kaggle网站的着名泰坦尼克数据集。我想用逻辑回归预测乘客的生存。我在R中使用glm()函数。我首先将我的数据帧(总行数= 891)分成两个数据帧,即train(从第1行到第800行)和test(从第801行到第891行)。 代码如下
`
>> data <- read.csv("train.csv", stringsAsFactors = FALSE)
>> names(data)
`[1] "PassengerId" "Survived" "Pclass" "Name" "Sex" "Age" "SibSp"
[8] "Parch" "Ticket" "Fare" "Cabin" "Embarked" `
#Replacing NA values in Age column with mean value of non NA values of Age.
>> data$Age[is.na(data$Age)] <- mean(data$Age, na.rm = TRUE)
#Converting sex into binary values. 1 for males and 0 for females.
>> sexcode <- ifelse(data$Sex == "male",1,0)
#dividing data into train and test data frames
>> train <- data[1:800,]
>> test <- data[801:891,]
#setting up the model using glm()
>> model <- glm(Survived~sexcode[1:800]+Age+Pclass+Fare,family=binomial(link='logit'),data=train, control = list(maxit = 50))
#creating a data frame
>> newtest <- data.frame(sexcode[801:891],test$Age,test$Pclass,test$Fare)
>> prediction <- predict(model,newdata = newtest,type='response')
`
当我运行最后一行代码时
prediction <- predict(model,newdata = newtest,type='response')
我收到以下错误
eval(expr,envir,enclos)中的错误:找不到对象'Age'
任何人都可以解释问题所在。我检查了newteset变量,似乎没有任何问题。
以下是泰坦尼克数据集https://www.kaggle.com/c/titanic/download/train.csv
的链接答案 0 :(得分:2)
首先,您应该将sexcode
直接添加到数据框:
data$sexcode <- ifelse(data$Sex == "male",1,0)
然后,正如我评论的那样,newtest
数据框中的列名称存在问题,因为您手动创建了它。您可以直接使用test
数据框。
所以这是您的完整工作代码:
data <- read.csv("train.csv", stringsAsFactors = FALSE)
data$Age[is.na(data$Age)] <- mean(data$Age, na.rm = TRUE)
data$sexcode <- ifelse(data$Sex == "male",1,0)
train <- data[1:800,]
test <- data[801:891,]
model <- glm(Survived~sexcode+Age+Pclass+Fare,family=binomial(link='logit'),data=train, control = list(maxit = 50))
prediction <- predict(model,newdata = test,type='response')