如果我使用第52页的e1071 documentation中的示例代码,我会获得类“factor”的pred
变量。
> str(pred)
Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
这很好;但是当我对我的数据使用相同的命令时,我获得了一个类“数字”的pred变量:
> str(pred)
Named num [1:1000] 0.95 0.0502 0.05 0.9902 -0.0448 ...
- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
这似乎不对;预测似乎根本不起作用。
我的代码是:
# create variables to store the path to the files you downloaded:
data.dir <- "c:/kaggle/scikit/"
train.file <- paste0(data.dir, 'train.csv')
trainLabels.file <- paste0(data.dir, 'trainLabels.csv')
# READ DATA - CAREFUL IF THERE IS A HEADER OR NOT
train <- read.csv(train.file, stringsAsFactors=F, header=FALSE)
trainLabels <- read.csv(trainLabels.file, stringsAsFactors=F, header=FALSE)
# LOADING LIBRARY e1071
install.packages('e1071')
library('e1071')
## classification mode
model <- svm(train, trainLabels)
summary(model)
# test with train data
pred <- predict(model, train)
我哪里出错了?
答案 0 :(得分:2)
好的,问题是我的课程是作为data.frame而不是因素提供的。
由于converting a data.frame to a factor上的另一个问题,我修复了它。
所以我的工作代码是:
data.dir <- "c:/xampp/htdocs/Big Data/kaggle/scikit/"
train.file <- paste0(data.dir, 'train.csv')
trainLabels.file <- paste0(data.dir, 'trainLabels.csv')
# READ DATA - CAREFUL IF THERE IS A HEADER OR NOT
train <- read.csv(train.file, stringsAsFactors=F, header=FALSE)
trainLabels <- read.csv(trainLabels.file, stringsAsFactors=F, header=FALSE)
# Make the trainLabels a factor
trainLabels <- as.factor(trainLabels$V1)
# APPLYING SVM TO KAGGLE DATA
install.packages('e1071')
library('e1071')
## classification mode
model <- svm(train, trainLabels)
summary(model)
# test with train data
pred <- predict(model, train)
# Check accuracy:
table(pred, trainLabels)