我有两个向量,实际值和预测值。两者都是8级的因子类型。第8级实际上只有55个观测值,预测值为0。然而,当我制作混淆矩阵时,8级观察结果消失或以某种方式移动。不应该将实际总和的列与实际计数相对应吗?
我制作了混淆矩阵两种不同的方法来仔细检查。我还尝试明确地使两个向量中的因子水平相同。到目前为止没有运气。
library(nnet); library(caret)
sc <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv")
# First column is ID
sc$LeagueIndex <- as.factor(sc$LeagueIndex)
sc <- sc[, -1]
# Set missing values to NA
which_qm <- sc[, c(2,3,4)] == '?'
sc[, c(2,3,4)][which_qm] <- NA
sc[, c(2,3,4)] <- apply(sc[, c(2,3,4)], 2, as.numeric)
# Set impossible values to NA
sc$TotalHours[sc$Age < sc$TotalHours/8760] <- NA
sc$HoursPerWeek[sc$HoursPerWeek >= 168] <- NA
# Fit model and store predictions
sc_mod1 <- multinom(LeagueIndex ~ ., sc)
sc_fitted1 <- predict(sc_mod1, sc)
# sc_fitted1 is missing factor level 8
confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)
# sc_fitted1 has factor level 8
levels(sc_fitted1) <- levels(sc$LeagueIndex)
confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)
# What's the problem?
table(sc$LeagueIndex)
length(sc$LeagueIndex)
table(sc_fitted1)
length(sc_fitted1)
答案 0 :(得分:1)
它与您生成的NA值有关,它们都是目标变量的8级。如果你想要考虑第8级,你可能必须找到另一种编码这些NA的方法。
试试这个作为反例:
library(nnet); library(caret)
sc <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv")
sc$LeagueIndex <- as.factor(sc$LeagueIndex)
sc <- sc[, -1]
which_qm <- sc[, c(2,3,4)] == '?'
sc[, c(2,3,4)][which_qm] <- 20 # this is just a random numeric value (not the best one to use!)
sc[, c(2,3,4)] <- apply(sc[, c(2,3,4)], 2, as.numeric)
sc_mod1 <- multinom(LeagueIndex ~ ., sc)
sc_fitted1 <- predict(sc_mod1, sc)
confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)
它会给你这样的东西:
actual
predicted 1 2 3 4 5 6 7 8
1 52 30 9 2 0 0 0 0
2 61 123 78 58 4 1 0 0
3 30 77 142 79 23 4 0 0
4 21 104 248 410 252 45 0 0
5 2 11 60 217 343 230 1 0
6 1 2 16 45 184 333 32 2
7 0 0 0 0 0 5 2 0
8 0 0 0 0 0 3 0 53