我试图绘制一个非线性决策边界,应该看起来像这样:
我已经拟合了形式的正则非线性逻辑回归:
这是我数据的摘录:
ones test1 test2 use
1 1 0.051267 0.69956 1
2 1 -0.092742 0.68494 1
3 1 -0.213710 0.69225 1
4 1 -0.375000 0.50219 1
5 1 -0.513250 0.46564 1
6 1 -0.524770 0.20980 1
这些是我使用optim()函数计算的参数:
[1] 0.377980476 -0.085951551 0.445140731
[4] -1.953080687 -0.506554404 -0.330330236
[7] 0.414649938 0.270281786 0.183804530
[10] -0.155359467 -0.753665545 0.351880543
[13] 0.238052214 0.619714119 -0.582420943
[16] 0.150625144 0.266319363 -0.331130949
[19] 0.177759335 -0.005402135 -0.124253913
[22] 0.085607070 0.580258782 0.973785263
[25] 0.387313615 0.237754576 -0.011198804
[28] -0.514447404
我还是R的新手,我对如何解决这个问题真的不知道,有人能帮帮我吗?
ones test1 test2 use
1 1 0.0512670 0.699560 1
2 1 -0.0927420 0.684940 1
3 1 -0.2137100 0.692250 1
4 1 -0.3750000 0.502190 1
5 1 -0.5132500 0.465640 1
6 1 -0.5247700 0.209800 1
7 1 -0.3980400 0.034357 1
8 1 -0.3058800 -0.192250 1
9 1 0.0167050 -0.404240 1
10 1 0.1319100 -0.513890 1
11 1 0.3853700 -0.565060 1
12 1 0.5293800 -0.521200 1
13 1 0.6388200 -0.243420 1
14 1 0.7367500 -0.184940 1
15 1 0.5466600 0.487570 1
16 1 0.3220000 0.582600 1
17 1 0.1664700 0.538740 1
18 1 -0.0466590 0.816520 1
19 1 -0.1733900 0.699560 1
20 1 -0.4786900 0.633770 1
21 1 -0.6054100 0.597220 1
22 1 -0.6284600 0.334060 1
23 1 -0.5938900 0.005117 1
24 1 -0.4210800 -0.272660 1
25 1 -0.1157800 -0.396930 1
26 1 0.2010400 -0.601610 1
27 1 0.4660100 -0.535820 1
28 1 0.6733900 -0.535820 1
29 1 -0.1388200 0.546050 1
30 1 -0.2943500 0.779970 1
31 1 -0.2655500 0.962720 1
32 1 -0.1618700 0.801900 1
33 1 -0.1733900 0.648390 1
34 1 -0.2828300 0.472950 1
35 1 -0.3634800 0.312130 1
36 1 -0.3001200 0.027047 1
37 1 -0.2367500 -0.214180 1
38 1 -0.0639400 -0.184940 1
39 1 0.0627880 -0.163010 1
40 1 0.2298400 -0.411550 1
41 1 0.2932000 -0.228800 1
42 1 0.4832900 -0.184940 1
43 1 0.6445900 -0.141080 1
44 1 0.4602500 0.012427 1
45 1 0.6273000 0.158630 1
46 1 0.5754600 0.268270 1
47 1 0.7252300 0.443710 1
48 1 0.2240800 0.524120 1
49 1 0.4429700 0.670320 1
50 1 0.3220000 0.692250 1
51 1 0.1376700 0.575290 1
52 1 -0.0063364 0.399850 1
53 1 -0.0927420 0.553360 1
54 1 -0.2079500 0.355990 1
55 1 -0.2079500 0.173250 1
56 1 -0.4383600 0.217110 1
57 1 -0.2194700 -0.016813 1
58 1 -0.1388200 -0.272660 1
59 1 0.1837600 0.933480 0
60 1 0.2240800 0.779970 0
61 1 0.2989600 0.619150 0
62 1 0.5063400 0.758040 0
63 1 0.6157800 0.728800 0
64 1 0.6042600 0.597220 0
65 1 0.7655500 0.502190 0
66 1 0.9268400 0.363300 0
67 1 0.8231600 0.275580 0
68 1 0.9614100 0.085526 0
69 1 0.9383600 0.012427 0
70 1 0.8634800 -0.082602 0
71 1 0.8980400 -0.206870 0
72 1 0.8519600 -0.367690 0
73 1 0.8289200 -0.521200 0
74 1 0.7943500 -0.557750 0
75 1 0.5927400 -0.740500 0
76 1 0.5178600 -0.594300 0
77 1 0.4660100 -0.418860 0
78 1 0.3508100 -0.579680 0
79 1 0.2874400 -0.769740 0
80 1 0.0858290 -0.755120 0
81 1 0.1491900 -0.579680 0
82 1 -0.1330600 -0.448100 0
83 1 -0.4095600 -0.411550 0
84 1 -0.3922800 -0.258040 0
85 1 -0.7436600 -0.258040 0
86 1 -0.6975800 0.041667 0
87 1 -0.7551800 0.290200 0
88 1 -0.6975800 0.684940 0
89 1 -0.4038000 0.706870 0
90 1 -0.3807600 0.918860 0
91 1 -0.5074900 0.904240 0
92 1 -0.5478100 0.706870 0
93 1 0.1031100 0.779970 0
94 1 0.0570280 0.918860 0
95 1 -0.1042600 0.991960 0
96 1 -0.0812210 1.108900 0
97 1 0.2874400 1.087000 0
98 1 0.3968900 0.823830 0
99 1 0.6388200 0.889620 0
100 1 0.8231600 0.663010 0
101 1 0.6733900 0.641080 0
102 1 1.0709000 0.100150 0
103 1 -0.0466590 -0.579680 0
104 1 -0.2367500 -0.638160 0
105 1 -0.1503500 -0.367690 0
106 1 -0.4902100 -0.301900 0
107 1 -0.4671700 -0.133770 0
108 1 -0.2885900 -0.060673 0
109 1 -0.6111800 -0.067982 0
110 1 -0.6630200 -0.214180 0
111 1 -0.5996500 -0.418860 0
112 1 -0.7263800 -0.082602 0
113 1 -0.8300700 0.312130 0
114 1 -0.7206200 0.538740 0
115 1 -0.5938900 0.494880 0
116 1 -0.4844500 0.999270 0
117 1 -0.0063364 0.999270 0
118 1 0.6326500 -0.030612 0
答案 0 :(得分:1)
虽然不是一个理想的答案,但您可以使用SVM
模型来形象化它(它会产生约0.83的样本内错误):
require(e1071)
data = data[, c("use", "test1", "test2")]
fit = svm(use ~ ., data = data)
plot(fit, data = data)
使用简单的转换,我们可以尝试获得线性可分的数据集:
data2 = data.frame(
y = factor(data[, "use"]),
x1 = data[, "test1"]^2,
x2 = data[, "test2"]^2 )
require(MASS)
fit = glm(y ~ x2 + x1, data = data2, family = binomial(link = "logit"))
plot(x2 ~ x1, data = data2, bg = as.numeric(y) + 1, pch = 21, main = "Logistic regression on Y ~ X1 + X2")
abline(-fit$coefficients[1]/fit$coefficients[2], -fit$coefficients[3]/fit$coefficients[2], col = 'blue', lwd = 2)
这给你这个(约0.73样本内错误):
所以现在你有了
Y = w0 + w1 * test1^2 + w2 * test2^2
您可以使用它来隔离test2 = f(test1)
并绘制非线性边界。