神经网络预测新的csv数据

时间:2018-12-04 13:39:47

标签: r csv prediction

我想通过使用Neuralnet包预测存储在csv文件中的新数据。我只是觉得很奇怪,对于所有预测的数据,输出为我提供了相同的常数值。我不知道我是否做得好。我的网络的准确度是95.3%

我的训练数据具有以下结构:

> head(dataset)
        POS Freq MAF Coverage protdesc Rendu
1 212812097 40.5   1     3994        1    NO
...
3 212812095 50.5   0     4000        0    YES

我的班级人渡有两个值。 并且我想在我的csv文件中预测Rendu,该文件具有以下结构

     POS Freq MAF Coverage protdesc
1  25398280 24.4   0     1962        0

这是我的代码

library(neuralnet)
library(caret)
library(reshape2)
library(ggplot2)
library(stringr)
# define the filename
filename <- "/home/rico/Documents/TrainingMutNGS.csv"
# load the CSV file from the local directory
dataset <- read.csv(filename, header=FALSE)
# set the column names in the dataset
colnames(dataset) <- c("POS","Freq","MAF","Coverage","protdesc","Rendu")
str(dataset)
# Split into Train and Validation sets
# Training Set : Validation Set = 70 : 30 (random)
set.seed(100)
train <- sample(nrow(dataset), 0.7*nrow(dataset), replace = FALSE)
TrainSet <- dataset[train,]
ValidSet <- dataset[-train,]
summary(TrainSet)
summary(ValidSet)
dataset[,1:5] <- scale(dataset[,1:5])
dataset <- cbind(dataset, model.matrix(~ 0 + Rendu, dataset))
trainInds <- sample(1:dim(dataset)[1],8550)
#Get the indeces for the test data
testInds <- setdiff(1:11400, trainInds)
#Get the variable names for the input and output
predictorVars <- names(dataset)[1:5]
outcomeVars   <- names(dataset)[7:8]
#Paste together the formula 
modFormula <- as.formula(paste(paste(outcomeVars, collapse = "+"), "~", paste(predictorVars, collapse = " + ")))
modFormula
MutNet <- neuralnet(formula = modFormula, data=dataset[trainInds,],hidden = c(4), act.fct = "logistic",linear.output = FALSE, lifesign = "minimal")
classes <- neuralnet::compute(MutNet, dataset[testInds,-c(6:8)])
classRes <- classes$net.result
nnClass <- apply(classRes, MARGIN = 1, which.max)
origClass <- apply(dataset[testInds, c(7:8)], MARGIN = 1, which.max)
paste("The classification accuracy of the network is", round(mean(nnClass == origClass) * 100, digits = 2), "%")
#Load csv file for prediction
library(readr)
Variants <- read.table(("/home/rico/Documents/test.csv"), sep=";", header=T, dec=".", fill=TRUE)
predi <- neuralnet::compute(MutNet,Variants[,1:5])
predi

我知道我包含在测试中的所有值都应为“ YES”,但是predi的输出为我提供了所有相同的值,我觉得很奇怪,有什么想法吗? $ net.result

             [,1]          [,2]
 [1,] 0.953410406 0.04689899119
 [2,] 0.953410406 0.04689899119
 [3,] 0.953410406 0.04689899119
 [4,] 0.953410406 0.04689899119
 [5,] 0.953410406 0.04689899119
 [6,] 0.953410406 0.04689899119
 [7,] 0.953410406 0.04689899119
 [8,] 0.953410406 0.04689899119
 [9,] 0.953410406 0.04689899119
[10,] 0.953410406 0.04689899119

0 个答案:

没有答案