我在R中使用支持向量机(SVM,包e1071)来构建分类模型和样本外预测7因子类。
问题是,当使用预测函数时,我获得了一个比验证集中的行数大得多的数组。请参阅下面的代码和结果。
关于出了什么问题的任何建议?我是否会错过 - 解释SVM包中的预测函数?
install.packages("e1071","caret")
library(e1071)
library(caret)
data <- data.frame(replicate(10,sample(0:6,1000,rep=TRUE)))
trainIndex <- createDataPartition(data[,1], p = 0.8,
list = FALSE,
times = 1)
trainset <- data[trainIndex,2:10]
validationset <- data[-trainIndex,2:10]
trainlabel <- data[trainIndex,1]
validationlabel <- data[-trainIndex,1]
svmModel <- svm(x = trainset,
y = trainlabel,
type = "C-classification",
kernel = "radial")
# Predict
svmPred <- predict(svmModel, x = validationset)
length(svmPred)
# 800, expected 200 since validationset has nrow = 200.
答案 0 :(得分:2)
这是因为预测中不存在x
尝试:
val lines = ssc.fileStream[LongWritable, Text, TextInputFormat](
"/path/to/file", (file: Path) =>
FilenameUtils.getExtension(file.toString).equalsIgnoreCase("txt"))
.map { case (_, text) => text.toString.split(" "))}