我的代码的目标是尝试获得一个良好的acuarccy但我相信预测功能是给我错误的结果只是导致结果没有意义
> # Text mining: Corpus and Document Term Matrix
library(tm)
# KNN model
library(class)
# Stemming words
library(SnowballC)
# CrossTable
library('gmodels')
# function prediction
library(caret)
# function for factor
library(e1071)
library(SparseM)
# Stemming words
# Read csv with columns: Document , Terms and category
PathFile <- read.csv(file.choose(), sep =";", header = TRUE)
PathFilename<-read.csv(file.choose(), sep =";", header = TRUE)
#Strectur of Csv file
str(PathFile)
tail(PathFile)
# Column bind category (known classification)
#mat.df <- cbind(PathFile, PathFile$Category)
#tail(mat.df)
# Change name of new column to "category"
#colnames(mat.df)[ncol(mat.df)] <- "Category"
# Split data by rownumber into two equal portions
train <- sample(nrow(PathFile), ceiling(nrow(PathFile) * .70))
test <- (1:nrow(PathFile))[- train]
##Show Training Data
train
##Show Test Data
test
#n <- names(PathFile)
#f <- as.formula(paste("Category ~", paste(n[!n %in% "Category"], collapse = " + ")))
#f
# Isolate classifier
cl <- PathFile[, "Category"]
# Create model data and remove "category"
modeldata <- PathFile[,!colnames(PathFile) %in% "Category"]
# Create model: training set, test set, training set classifier
knn.pred <- knn(modeldata[train, ], modeldata[test, ], cl[train], 70)
knn.pred
# Confusion matrix
conf.mat <- table("Predictions" = knn.pred, Actual = cl[test])
conf.mat
ff<-predict(knn.pred,PathFilename) #here i have an error
ct<-CrossTable(x = cl[test], y = knn.pred, prop.chisq=FALSE)
table(knn.pred,cl[test])
plot(knn.pred,
xlab = "Number of neighbours(k)",
main = "Comparison of Accuracy against k",
type = "b",
col = "black",
lwd = 1.8,
pch = "O")
studentModel <- train(Category ~ ., data=PathFile, method = "knn")
studentTestPred <- predict(model, test)
# Accuracy
(accuracy <- sum(diag(conf.mat))/length(test) * 100)
# Create data frame with test data and predicted category
setwd("C:/Users/Public/Desktop/")
df.pred <- cbind(knn.pred, modeldata[test, ])
write.table(df.pred, file="output.csv", sep=";")
并且预测功能通常会给我这种结果 这是矩阵的输出 它给了我一行预制???
Actual
Predictions Art Eco Env Hea Pol Sci Spo
Art 25 31 26 37 33 27 27
Eco 0 0 0 0 0 0 0
Env 0 0 0 0 0 0 0
Hea 0 0 0 0 0 0 0
Pol 0 0 0 0 0 0 0
Sci 0 0 0 0 0 0 0
Spo 0 0 0 0 0 0 0