我使用人口普查数据来构建逻辑回归模型和SVM模型,首先,我将< = 50K转换为0,然后将50K转换为1以使数据二项式化。我尝试计算两种模型的精度和召回率,并比较哪种模型表现更好。但是table(test$salary,pred1 >0.5)
对于SVM模型只给出了错误值而没有真值(FALSE
0 26
1 8)。有谁知道问题是什么?
我是R软件的新手,我希望我能从这里得到帮助。谢谢。欢迎任何帮助。我希望这个问题足够清楚。
#setwd("C:/Users/)
Censusdata <- read.csv(file="census-data.csv", header=TRUE, sep=",")
library("dplyr", lib.loc="~/R/win-library/3.4")
# convert <=50K to 0, >50K to 1
data = Censusdata
data$salary<-as.numeric(factor(data$salary))-1
library(lattice)
library(ggplot2)
library(caret)
data <- Censusdata
indexes <- sample(1:nrow(data),size=0.7*nrow(data))
test <- data[indexes,]
train <- data[-indexes,]
#logistic regression model fit
model <- glm(salary ~ education.num + hours.per.week,family = binomial,data = test)
pred <- predict(model,data=train)
summary(model)
# calculate precision and recall
table(test$salary,pre >0.5)
# I get
FALSE TRUE
0 26 0
1 6 2
# for SVM model
model1 <- svm(salary ~ education.num + hours.per.week,family = binomial, data=test)
pred1 <- predict(model1,data=train)
table(test$salary,pred1 >0.5)
# I get the following
FALSE
0 26
1 8