使用R

时间:2019-03-10 05:43:22

标签: r machine-learning classification knn

我正在尝试使用R中ISLR包中的Auto数据集来拟合KNN模型并获得决策边界。

在这里,我很难确定3类问题的决策边界。到目前为止,这是我的代码。我没有决策边界。

我在该网站的其他地方看到了使用ggplot解决此类问题的答案。但是我想使用plot函数以经典方式获得答案。

 library("ISLR")

trainxx=Auto[,c(1,3)]
trainyy=(Auto[,8])

n.grid1 <- 50

x1.grid1 <- seq(f = min(trainxx[, 1]), t = max(trainxx[, 1]), l = n.grid1)
x2.grid1 <- seq(f = min(trainxx[, 2]), t = max(trainxx[, 2]), l = n.grid1)
grid <- expand.grid(x1.grid1, x2.grid1)

library("class")
mod.opt <- knn(trainxx, grid, trainyy, k = 10, prob = T)

prob_knn <- attr(mod.opt, "prob") 

我的问题主要是在这段代码之后。我非常确定我必须修改以下部分。但是我不知道如何。我是否需要在此处使用“嵌套的条件”?

prob_knn <- ifelse(mod.opt == "3", prob_knn, 1 - prob_knn) 



prob_knn <- matrix(prob_knn, n.grid1, n.grid1)


plot(trainxx, col = ifelse(trainyy == "3", "green",ifelse(trainyy=="2", "red","blue")))
title(main = "plot of training data with Desicion boundary K=80")
contour(x1.grid1, x2.grid1, prob_knn, levels = 0.5, labels = "", xlab = "", ylab = "", 
        main = "", add = T , pch=20)

如果有人可以提出解决此问题的建议,将会有很大帮助。

基本上,对于3类问题,我需要类似的东西 https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o

2 个答案:

答案 0 :(得分:3)

这是一种经过调整的方法,将决策边界绘制为线条。我以为,这需要每个班级的预测概率,但是在阅读this answer之后,您可以将每个班级的预测概率标记为1,否则将其标记为零。

# Create matrices for each class where p = 1 for any point
#   where that class was predicted, 0 otherwise
n_classes = 3
class_regions = lapply(1:n_classes, function(class_num) {
    indicator = ifelse(mod.opt == class_num, 1, 0)
    mat = matrix(indicator, n.grid1, n.grid1)
})

# Set up colours
class_colors = c("#4E79A7", "#F28E2B", "#E15759")
# Add some transparency to make the fill colours less bright
fill_colors = paste0(class_colors, "60")

# Use image to plot the predicted class at each point
classes = matrix(as.numeric(mod.opt), n.grid1, n.grid1)
image(x1.grid1, x2.grid1, classes, col = fill_colors, 
      main = "plot of training data with decision boundary",
      xlab = colnames(trainxx)[1], ylab = colnames(trainxx)[2])
# Draw contours separately for each class
lapply(1:n_classes, function(class_num) {
    contour(x1.grid1, x2.grid1, class_regions[[class_num]], 
            col = class_colors[class_num],
            nlevels = TRUE, add = TRUE, lwd = 2, drawlabels = FALSE)
})
# Using pch = 21 for bordered points that stand out a bit better
points(trainxx, bg = class_colors[trainyy], 
       col = "black",
       pch = 21)

结果图:

Plot with lines for decision boundaries

答案 1 :(得分:0)

我认为,与其尝试将决策边界绘制为一条线,不如仅在网格中的每个点使用预测的类并将其绘制为填充区域可能会更容易:

SELECT
    Location_ID, 
    Collect_Month_Key AS Collect_Date, 
    Calc_Gross_Totals, 
    Loc_Country, 
    CONVERT(varchar(8),Collect_Month_Key) + '-' + Location_ID AS Unique_Key
FROM 
    FT_GPM_NPM_CYCLES AS cyc
    INNER JOIN LU_Location AS loc
        ON cyc.Lu_Loc_Key = loc.LU_Loc_Key
    INNER JOIN LU_Loc_Country AS cty
        ON loc.LU_Loc_Country_Key = cty.LU_Loc_Country_Key
WHERE 
    Collect_Month_Key > '20160101'
ORDER BY 
    Location_ID, 
    Collect_Month_Key

请注意,在这里,我已将您的代码中的# Use the predicted class at each point classes = matrix(as.numeric(mod.opt), n.grid1, n.grid1) class_colors = c("#4E79A7", "#F28E2B", "#E15759") # Add some transparency to make the fill colours less bright fill_colors = paste0(class_colors, "88") # Use image to plot the predicted class at each point image(x1.grid1, x2.grid1, classes, col = fill_colors, main = "plot of training data with decision boundary", xlab = colnames(trainxx)[1], ylab = colnames(trainxx)[2]) points(trainxx, col = class_colors[trainyy], pch = 16) 增加到200,以获取各区域的详细边框。

输出:

Plot with filled regions