我正在尝试在我的数据集上使用knn,该数据集有65499行和6列
我的数据集:
> dput(head(sampleknn))
structure(list(RequestorSeniority = c(1L, 2L, 2L, 4L, 1L, 4L),
ITOwner = c(50L, 15L, 15L, 22L, 22L, 38L), Severity = c(2L,
1L, 2L, 2L, 2L, 2L), Priority = c(0L, 1L, 0L, 0L, 1L, 3L),
daysOpen = c(3L, 5L, 0L, 20L, 1L, 0L), Satisfaction = structure(c(4L,
4L, 3L, 3L, 4L, 3L), .Label = c("Amazing", "Satisfied", "Unknown",
"Unsatisfied"), class = "factor")), .Names = c("RequestorSeniority",
"ITOwner", "Severity", "Priority", "daysOpen", "Satisfaction"
), row.names = c(NA, 6L), class = "data.frame")
>str(sampleknn)
'data.frame': 65499 obs. of 6 variables:
$ RequestorSeniority: int 1 2 2 4 1 4 3 4 2 3 ...
$ ITOwner : int 50 15 15 22 22 38 10 1 14 46 ...
$ Severity : int 2 1 2 2 2 2 2 2 2 2 ...
$ Priority : int 0 1 0 0 1 3 3 0 2 1 ...
$ daysOpen : int 3 5 0 20 1 0 9 15 6 1 ...
$ Satisfaction : Factor w/ 4 levels "Amazing","Satisfied",..: 4 4 3 3 4 3 3 3 4 4 ...
现在我正在尝试在此数据集上使用knn(下面的代码)并且它给出了以下错误:
knn出错(train = sampleknn_train,test = sampleknn_test,cl = sampleknn_test_target,:'train'和'class'有不同 长度
代码:
sampleknn <- read.csv(file="HelpDesk.csv",head=TRUE,sep=",")
str(sampleknn)
#---scaling
normalize <- function(x) {
return((x-min(x))/(max(x)-min(x)))
}
sampleknn_n <- as.data.frame(lapply(sampleknn[ ,c(1,2,3,4,5)], normalize))
str(sampleknn_n)
#train the data from sampleknn_n
sampleknn_train <- sampleknn_n[1:65000, ]
#create a test dataset
sampleknn_test <- sampleknn_n[65001:65499, ]
#isolate test and train satisfaction levels
sampleknn_train_target <- sampleknn[1:65000, 6]
sampleknn_test_target <- sampleknn[65001:65499, 6]
#-----knn model
sqrt(65499)
m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_test_target,k=255)
现在,当我运行最后一行(m1&lt; -...)时,它给出了错误'train'和'class'有不同的长度。我试着寻找能够解决相同问题的答案,但似乎没有什么对我有用。这个问题的解决方法是什么?如果您需要更多信息,请告诉我。
修改:
在标准化之前:
RequestorSeniority ITOwner Severity Priority daysOpen Satisfaction
1 50 2 0 3 Unsatisfied
2 15 1 1 5 Unsatisfied
2 15 2 0 0 Unknown
4 22 2 0 20 Unknown
1 22 2 1 1 Unsatisfied
4 38 2 3 0 Unknown
规范化后:
RequestorSeniority ITOwner Severity Priority daysOpen
0.0000000000 1.0000000000 0.50 0.0000000000 0.05555555556
0.3333333333 0.2857142857 0.25 0.3333333333 0.09259259259
0.3333333333 0.2857142857 0.50 0.0000000000 0.00000000000
1.0000000000 0.4285714286 0.50 0.0000000000 0.37037037037
0.0000000000 0.4285714286 0.50 0.3333333333 0.01851851852
1.0000000000 0.7551020408 0.50 1.0000000000 0.00000000000
> dput(head(sampleknn_n))
structure(list(RequestorSeniority = c(0, 0.333333333333333, 0.333333333333333,
1, 0, 1), ITOwner = c(1, 0.285714285714286, 0.285714285714286,
0.428571428571429, 0.428571428571429, 0.755102040816326), Severity = c(0.5,
0.25, 0.5, 0.5, 0.5, 0.5), Priority = c(0, 0.333333333333333,
0, 0, 0.333333333333333, 1), daysOpen = c(0.0555555555555556,
0.0925925925925926, 0, 0.37037037037037, 0.0185185185185185,
0)), .Names = c("RequestorSeniority", "ITOwner", "Severity",
"Priority", "daysOpen"), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:0)
来自?knn
:
训练集真实分类的cl因子
因此你应该写下你的陈述:
m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_train_target,k=255)