我正在尝试运行此代码。它让一切正常,直到我想创建我的Plot。
# Install package to use Support Vector Machine Algorithm
install.packages("e1071")
# If this function does not work click on the packages tab and check e1071
library("e1071", lib.loc="/Library/Frameworks/R.framework/Versions/3.2/Resources/library")
# Choose File
diabetes <- read.csv(file.choose(), na.strings = "?")
View(diabetes)
##### Data Preprocessing
# Count number of rows with missing data
sum(!complete.cases(diabetes))
# Summary of data set
summary(diabetes)
str(diabetes)
# Replace "no" and ">30" with 0 and "<30" with 1
diabetes$readmitted<-as.character(diabetes$readmitted)
diabetes$readmitted[diabetes$readmitted== "NO"] <- "0"
diabetes$readmitted[diabetes$readmitted== "<30"] <- "1"
diabetes$readmitted[diabetes$readmitted== ">30"] <- "0"
diabetes$readmitted<-factor(diabetes$readmitted)
str(diabetes$readmitted)
summary(diabetes$readmitted)
# Removal of insignificant variables
diabetes$encounter_id<-NULL
diabetes$patient_nbr<-NULL
diabetes$weight<-NULL # Weight had too many missing values to be a part of our model
diabetes$payer_code<-NULL
diabetes$medical_specialty<-NULL
diabetes$nateglinide<-NULL
diabetes$chlorpropamide<-NULL
diabetes$acetohexamide<-NULL
diabetes$tolbutamide<-NULL
diabetes$acarbose<-NULL
diabetes$miglitol<-NULL
diabetes$troglitazone<-NULL
diabetes$tolazamide<-NULL
diabetes$examide<-NULL
diabetes$citoglipton<-NULL
diabetes$glyburide.metformin<-NULL
diabetes$glipizide.metformin<-NULL
diabetes$glimepiride.pioglitazone<-NULL
diabetes$metformin.rosiglitazone<-NULL
diabetes$metformin.pioglitazone<-NULL
# Change variables to be factors
diabetes$admission_type_id<-factor(diabetes$admission_type_id)
diabetes$discharge_disposition_id<-factor(diabetes$discharge_disposition_id)
diabetes$admission_source_id<-factor(diabetes$admission_source_id)
str(diabetes)
# Summary after data pre-processing
summary(diabetes)
# Set Seed and split data set into training and test data
set.seed(1234)
ind <- sample(2, nrow(diabetes), replace = TRUE, prob = c(0.7, 0.3))
train.data <- diabetes[ind == 1, ]
test.data <- diabetes[ind == 2, ]
# Create Model using readmitted as dependent variable
model1<-readmitted~.
model1<-svm(readmitted~., data=train.data)
summary(model1)
plot(model1, diabetes, type='C-classification', kernel='radial')
### I am also having trouble here making the tables###########
# Create table of model vs training data in confusion matrix
table(predict(model1), train.data$readmitted)
# Pull Test data to get confusion matrix
testPred <- predict(model1, newdata = test.data)
table (testPred, test.data$readmitted)
# Create second model using select readmitted and select variables
model2<-readmitted~race + gender + age + admission_type_id + discharge_disposition_id + time_in_hospital + num_lab_procedures + num_procedures + num_medications + number_outpatient + number_inpatient + number_emergency + number_diagnoses + change + diabetesMed
model2<-svm(model2, data=train.data)
summary(model2)
### Also having trouble here making the second table#########
# Create table using second model and training data
table(predict(model2), train.data$readmitted)
testPred2 <- predict(model2, newdata = test.data)
table (testPred2, test.data$readmitted)
我一直在玩剧情和桌子,似乎没有任何工作。
我一直在使用一个包含9999行的数据集来测试它。但我的真实数据集是107,000行。所以运行它需要很长时间才能发现我错了。 任何帮助将不胜感激。谢谢
答案 0 :(得分:1)
好吧,我需要你正在处理的数据。我确实在大数据集上遇到了这些问题。
远远超出上述说法,用于快速处理数据,以便您可以使用整个数据集并可视化大型数据集。
我不确定你的情节是什么错误。请告诉你的错误。