我使用crosstable作为kNN模型,但输出没有按预期显示。它向我展示了一堆数字而不是预测模型。 (我会添加一个图像,但我需要10个声望点)。我想要一个输出清晰的表。
#I'm setting working directory folder
setwd("F:/Level 5/CT5018 - Data Analytics/My project/Official Dataset - Adult")
#start calculating the time to run the code
k <-Sys.time()
#here I'm assigning adults to read the csv file
adults <- read.csv("Adults.csv", stringsAsFactors = FALSE)
#examine the structure of the adultsTr data frame
str(adults)
#drop the fnlwgt feature
adults <- adults[-3]
#table of sex
table(adults$Sex)
#recode Sex as a factor
adults$Sex <- factor(adults$Sex, levels = c("Female","Male"),
labels = c("Women", "Men"))
#table or proportions with more informative labels
round(prop.table(table(adults$Sex)) * 100, digits = 1)
#summarize all numeric features
summary(adults[c("Age", "Education.num", "Capital.gain", "Capital.loss", "Hours.per.week")])
#----------------------------------------------Min-Max normalisation----------------------------- ------------------------------
#create normalization function
normalize <- function(x) {
return ((x - min (x)) / (max(x) - min(x)))
}
#test normalization function - result should be identical
normalize(c(1, 2, 3, 4, 5))
normalize(c(10, 20, 30, 40, 50))
#normalize the adultsTr data
adultsN <- as.data.frame(lapply(adults[c("Age", "Education.num", "Capital.gain", "Capital.loss", "Hours.per.week")], normalize))
#confirm that normalization worked
summary(adultsN$Age)
# create training and test data
adultsTrain <- adultsN[1:14999, ]
adultsTest <- adultsN[15000:19999, ]
# create labels for training and test data
adultsTrainLabels <- adults[1:14999, 1]
adultsTestLabels <- adults[15000:19999, 1]
#instaling package class
#install.packages("class")
library(class)
adultsTestPred <- knn(train = adultsTrain, test = adultsTest,
cl = adultsTrainLabels, k=122)
#installing package for cross tables
#install.packages("gmodels")
library(gmodels)
# Create the cross tabulation of predicted vs. actual
CrossTable(x = adultsTestLabels, y = adultsTestPred,
prop.chisq=FALSE)
Cell Contents |-------------------------| | N | | N / Row Total | | N / Col Total | | N / Table Total | |-------------------------| Total Observations in Table: 5000 | adultsTestPred adultsTestLabels | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 71 | 72 | 73 | 76 | 77 | 90 | Row Total | -----------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| 17 | 65 | 6 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 74 | | 0.878 | 0.081 | 0.014 | 0.014 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.014 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.015 | | 0.556 | 0.200 | 0.012 | 0.005 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.004 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | -------------------------------------------- Where I actually want this: ----------------------------------------------- Cell Contents |-------------------------| | N | | N / Table Total | |-------------------------| Total Observations in Table: 2000 | predicted Sex actual Sex | Female | Male | Row Total | -------------|-----------|-----------|-----------| Female | 514 | 161 | 675 | | 0.257 | 0.080 | | -------------|-----------|-----------|-----------| Male | 162 | 1163 | 1325 | | 0.081 | 0.582 | | -------------|-----------|-----------|-----------| Column Total | 676 | 1324 | 2000 | -------------|-----------|-----------|-----------|
答案 0 :(得分:0)
我遇到了同样的问题。我犯的错误就是把id和标签栏混在一起。
我的数据框架就像x = [Id,label,Feature 1,Feature 2 ....] 我将标签分配为x [1]而不是x [2]。 在规范化之前尝试获取标签。