我从knn
函数中获得了一系列建模类标签。我有一个带有基本数字训练数据的数据框,以及另一个用于测试数据的数据框。我如何为knn
函数的返回值绘制决策边界?我必须在锁定的机器上复制我的发现,所以请尽可能限制第三方库的使用。
我只有两个班级标签,“橙色”和“蓝色”。它们用训练数据绘制在简单的2D图上。同样,我只想在knn
函数的结果周围画一个边界。
代码:
library(class)
n <- 100
set.seed(1)
x <- round(runif(n, 1, n))
set.seed(2)
y <- round(runif(n, 1, n))
train.df <- data.frame(x, y)
set.seed(1)
x.test <- round(runif(n, 1, n))
set.seed(2)
y.test <- round(runif(n, 1, n))
test.df <- data.frame(x.test, y.test)
k <- knn(train.df, test.df, classes, k=25)
plot(test.df, col=k)
classes
只是一个由早期代码确定的类标签的向量。
如果您需要,下面是我工作的完整代码:
library(class)
n <- 100
set.seed(1)
x <- round(runif(n, 1, n))
set.seed(2)
y <- round(runif(n, 1, n))
# ============================================================
# Bayes Classifier + Decision Boundary Code
# ============================================================
classes <- "null"
colours <- "null"
for (i in 1:n)
{
# P(C = j | X = x, Y = y) = prob
# "The probability that the class (C) is orange (j) when X is some x, and Y is some y"
# Two predictors that influence classification: x, y
# If x and y are both under 50, there is a 90% chance of being orange (grouping)
# If x and y and both over 50, or if one of them is over 50, grouping is blue
# Algorithm favours whichever grouping has a higher chance of success, then plots using that colour
# When prob (from above) is 50%, the boundary is drawn
percentChance <- 0
if (x[i] < 50 && y[i] < 50)
{
# 95% chance of orange and 5% chance of blue
# Bayes Decision Boundary therefore assigns to orange when x < 50 and y < 50
# "colours" is the Decision Boundary grouping, not the plotted grouping
percentChance <- 95
colours[i] <- "orange"
}
else
{
percentChance <- 10
colours[i] <- "blue"
}
if (round(runif(1, 1, 100)) > percentChance)
{
classes[i] <- "blue"
}
else
{
classes[i] <- "orange"
}
}
boundary.x <- seq(0, 100, by=1)
boundary.y <- 0
for (i in 1:101)
{
if (i > 49)
{
boundary.y[i] <- -10 # just for the sake of visual consistency, real value is 0
}
else
{
boundary.y[i] <- 50
}
}
df <- data.frame(boundary.x, boundary.y)
plot(x, y, col=classes)
lines(df, type="l", lty=2, lwd=2, col="red")
# ============================================================
# K-Nearest neighbour code
# ============================================================
#library(class)
#n <- 100
#set.seed(1)
#x <- round(runif(n, 1, n))
#set.seed(2)
#y <- round(runif(n, 1, n))
train.df <- data.frame(x, y)
set.seed(1)
x.test <- round(runif(n, 1, n))
set.seed(2)
y.test <- round(runif(n, 1, n))
test.df <- data.frame(x.test, y.test)
k <- knn(train.df, test.df, classes, k=25)
plot(test.df, col=k)
答案 0 :(得分:5)
获取网格上的类概率预测,并在P = 0.5处绘制轮廓线(或者您想要截止点的任何值)。这也是Venables和Ripley的经典MASS教科书以及Hastie,Tibshirani和Friedman的统计学习元素中使用的方法。
RestAssurd