R(来自crosstable的意外输出)

时间:2015-01-08 16:29:49

标签: r rstudio

我使用crosstable作为kNN模型,但输出没有按预期显示。它向我展示了一堆数字而不是预测模型。 (我会添加一个图像,但我需要10个声望点)。我想要一个输出清晰的表。

#I'm setting working directory folder
setwd("F:/Level 5/CT5018 - Data Analytics/My project/Official Dataset - Adult")

#start calculating the time to run the code
k <-Sys.time()

#here I'm assigning adults to read the csv file
adults <- read.csv("Adults.csv", stringsAsFactors = FALSE)

#examine the structure of the adultsTr data frame
str(adults)

#drop the fnlwgt feature
adults <- adults[-3]

#table of sex
table(adults$Sex)

#recode Sex as a factor
adults$Sex <- factor(adults$Sex, levels = c("Female","Male"),
                       labels = c("Women", "Men"))

#table or proportions with more informative labels
round(prop.table(table(adults$Sex)) * 100, digits = 1)

#summarize all numeric features
summary(adults[c("Age", "Education.num", "Capital.gain", "Capital.loss", "Hours.per.week")])

#----------------------------------------------Min-Max normalisation-----------------------------    ------------------------------


#create normalization function
normalize <- function(x) {
  return ((x - min (x)) / (max(x) - min(x)))
}

#test normalization function - result should be identical
normalize(c(1, 2, 3, 4, 5))
normalize(c(10, 20, 30, 40, 50))

#normalize the adultsTr data
adultsN <- as.data.frame(lapply(adults[c("Age", "Education.num", "Capital.gain", "Capital.loss",   "Hours.per.week")], normalize))

#confirm that normalization worked
summary(adultsN$Age)

# create training and test data
adultsTrain <- adultsN[1:14999, ]
adultsTest <- adultsN[15000:19999, ]

# create labels for training and test data
adultsTrainLabels <- adults[1:14999, 1]
adultsTestLabels <- adults[15000:19999, 1]

#instaling package class
#install.packages("class")
library(class)

adultsTestPred <- knn(train = adultsTrain, test = adultsTest,
                      cl = adultsTrainLabels, k=122)

#installing package for cross tables
#install.packages("gmodels")
library(gmodels)

# Create the cross tabulation of predicted vs. actual
CrossTable(x = adultsTestLabels, y = adultsTestPred,
           prop.chisq=FALSE)

这是它向我展示的内容:

 Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|


Total Observations in Table:  5000 


| adultsTestPred 
adultsTestLabels |        17 |        18 |        19 |        20 |        21 |        22 |        23 |        24 |        25 |        26 |        27 |        28 |        29 |        30 |        31 |        32 |        33 |        34 |        35 |        36 |        37 |        38 |        39 |        40 |        41 |        42 |        43 |        44 |        45 |        46 |        47 |        48 |        49 |        50 |        51 |        52 |        53 |        54 |        55 |        56 |        57 |        58 |        59 |        60 |        61 |        62 |        63 |        64 |        65 |        66 |        67 |        68 |        69 |        71 |        72 |        73 |        76 |        77 |        90 | Row Total | 
-----------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
              17 |        65 |         6 |         1 |         1 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         1 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |        74 | 
                 |     0.878 |     0.081 |     0.014 |     0.014 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.014 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.015 | 
                 |     0.556 |     0.200 |     0.012 |     0.005 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.004 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 | 

--------------------------------------------
Where I actually want this:
-----------------------------------------------
   Cell Contents
|-------------------------|
|                       N |
|         N / Table Total |
|-------------------------|


Total Observations in Table:  2000 


| predicted Sex 

actual Sex |    Female |      Male | Row Total | 

-------------|-----------|-----------|-----------|
      Female |       514 |       161 |       675 | 
             |     0.257 |     0.080 |           | 
-------------|-----------|-----------|-----------|
        Male |       162 |      1163 |      1325 | 
             |     0.081 |     0.582 |           | 
-------------|-----------|-----------|-----------|
Column Total |       676 |      1324 |      2000 | 
-------------|-----------|-----------|-----------|

1 个答案:

答案 0 :(得分:0)

我遇到了同样的问题。我犯的错误就是把id和标签栏混在一起。

我的数据框架就像x = [Id,label,Feature 1,Feature 2 ....] 我将标签分配为x [1]而不是x [2]。 在规范化之前尝试获取标签。