当我们为因子字段提供标签时会发生什么变化? 在下面的代码中,我曾将标签分配为0和1,下次我将标签分配为0和10 ^ 6。 根据我的知识,标签只是提供类别的替代名称,在这种情况下是男性和女性。 请注意,我提供了数字标签而不是字符标签。
似乎标签正在为改变数据点的eucladian距离的类别提供某种数字权重。 下面提供了两个相应结果的问题代码
数据集
> head(dataset)
User.ID Gender Age EstimatedSalary Purchased
1 15624510 1 19 19000 0
2 15810944 1 35 20000 0
3 15668575 0 26 43000 0
4 15603246 0 27 57000 0
5 15804002 1 19 76000 0
6 15728773 1 27 58000 0
标签= c(0,1)的R代码
dataset <- read.csv("~/Desktop/Machine Learning /ML_16/Social_Network_Ads.csv")
dataset$Gender <- factor(dataset$Gender , levels = c("Female","Male") , labels = c(0 , 1))
library(caTools)
set.seed(1231)
sample_split <- sample.split(dataset$Gender , SplitRatio = 0.8)
training_dataset <- subset(dataset , sample_split == TRUE)
testing_dataset <- subset(dataset , sample_split == FALSE)
library(class)
model_classifier <- knn(train = training_dataset[,-5] , test = testing_dataset[,-5] , cl = training_dataset$Purchased , k = 21 )
library(caret)
confusionMatrix(table(model_classifier , testing_dataset$Purchased))
结果
Confusion Matrix and Statistics
model_classifier 0 1
0 47 18
1 4 11
Accuracy : 0.725
标签= c(0,10 ^ 6)
的R代码dataset <- read.csv("~/Desktop/Machine Learning /ML_16/Social_Network_Ads.csv")
dataset$Gender <- factor(dataset$Gender , levels = c("Female","Male") , labels = c(0 , 10^6))
library(caTools)
set.seed(1231)
sample_split <- sample.split(dataset$Gender , SplitRatio = 0.8)
training_dataset <- subset(dataset , sample_split == TRUE)
testing_dataset <- subset(dataset , sample_split == FALSE)
library(class)
model_classifier <- knn(train = training_dataset[,-5] , test = testing_dataset[,-5] , cl = training_dataset$Purchased , k = 21 )
library(caret)
confusionMatrix(table(model_classifier , testing_dataset$Purchased))
结果
Confusion Matrix and Statistics
model_classifier 0 1
0 50 23
1 1 6
Accuracy : 0.7
标签究竟是什么?如果我们提供数字标签,它具有相同的数学意义