Question

我指定'n'点数。将它们标记为+1或-1。我将所有这些存储在一个类似于{'point1' : [(0.565,-0.676), +1], ... }的字典中。我试图找到一条分隔它们的线 - 即线上方标记为+1的点，线下方的-1。有人可以帮忙吗？

我正在尝试将w = w + y(r)用作“学习算法”，w是权重向量y是+1或-1，{{ 1}}是点

代码运行但分离线不精确 - 它没有正确分开。此外，随着我增加要分开的点数，该线的效率会降低。

如果运行代码，绿线应该是分隔线。它越接近蓝线的斜率（根据定义，完美的线条）越好。

Answer 1

例如，您可以使用scikit-learn（sklearn）中的SGDClassifier。线性分类器按如下方式计算预测（参见the source code）：

def predict(self, X):
        scores = self.decision_function(X)
        if len(scores.shape) == 1:
            indices = (scores > 0).astype(np.int)
        else:
            indices = scores.argmax(axis=1)
    return self.classes_[indices]

其中decision_function由：

给出

def decision_function(self, X):
        [...]

        scores = safe_sparse_dot(X, self.coef_.T,
                                 dense_output=True) + self.intercept_
    return scores.ravel() if scores.shape[1] == 1 else scores

因此，对于您的示例的二维情况，这意味着数据点被归类为+1如果

x*w1 + y*w2 + i > 0

，其中

x, y = X
w1, w2 = self.coef_
i = self.intercept_

和-1否则。因此，决定取决于x*w1 + y*w2 + i大于或小于（或等于）零。因此＆＃34;边界＆＃34;通过设置x*w1 + y*w2 + i == 0找到。我们可以自由选择其中一个组件，另一个由这个公式确定。

以下代码段适合SGDClassifier并绘制生成的＆＃34;边框＆＃34;。它假设数据点分散在原点（x, y = 0, 0）周围，即它们的平均值（大约）为零。实际上，为了获得良好的结果，首先应该从数据点中减去均值，然后执行拟合，然后将均值加回到结果中。以下代码段只是散布原点周围的点。

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import SGDClassifier

n = 100
x = np.random.uniform(-1, 1, size=(n, 2))

# We assume points are scatter around zero.
b = np.zeros(2)
d = np.random.uniform(-1, 1, size=2)
slope, intercept = (d[1] / d[0]), 0.

fig, ax = plt.subplots(figsize=(8,8))
ax.scatter(x[:, 0], x[:, 1], color = 'black')
ax.plot([b[0], d[0]], [b[1], d[1]], 'b-', label='Ideal')

labels = []
for point in x:
    if(point[1] > (slope * point[0] + intercept)):
        ax.annotate('+', xy=point, xytext=(0, -10), textcoords='offset points', color = 'blue', ha='center', va='center')
        labels.append(1)
    else:
        ax.annotate('--', xy=point, xytext=(0, -10), textcoords='offset points', color = 'red', ha='center', va='center')
        labels.append(-1)

labels = np.array(labels)
classifier = SGDClassifier()
classifier.fit(x, labels)

x1 = np.random.uniform(-1, 1)
x2 = (-classifier.intercept_ - x1 * classifier.coef_[0, 0]) / classifier.coef_[0, 1]

ax.plot([0, x1], [0, x2], 'g--', label='Fit')

plt.legend()
plt.show()

此图显示n = 100个数据点的结果：

下图显示了不同n的结果，其中从池中随机选择了包含1000个数据点的点：

Answer 2

这就是我提出的答案。我意识到的一些注意事项：
w = w + y（r）算法仅适用于规范化向量。＆＃39; W＆＃39;是权重向量，＆＃39; r＆＃39;是[x，y]的问题，＆＃39; y＆＃39;是标签的标志你可以从得到的矢量中找到斜率和截距＆＃39; w＆＃39;通过将系数放在ax + + + c = 0形式并求解＆＃39; y＆＃39;。

w = np.array([0,0,0])
restart = True
while restart:  
    ii = 0
    restart = False
    for x,y in pts10:
        if(restart == False):
            ii += 1

    r = np.array([x,y,1])    
    if (np.dot(w,r) >= 0) and int(label_dict['point{}'.format(ii)][1]) >= 0:
        print "Point " + str(ii) + " is correctly above the line --> no adjustments"      
    elif (np.dot(w,r) < 0) and int(label_dict['point{}'.format(ii)][1]) < 0:
        print "Point " + str(ii) + " is correctly below the line --> no adjustments"        
    elif (np.dot(w,r) >= 0) and int(label_dict['point{}'.format(ii)][1]) < 0:
        print "Point " + str(ii) + " should have been below the line"  
        w = np.subtract(w,r)
        restart = True
        break       
    elif (np.dot(w,r) < 0) and int(label_dict['point{}'.format(ii)][1]) >= 0:
        print "Point " + str(ii) + " should have been above the line"           
        w = np.add(w,r)
        restart = True
        break           
    else:
        print "THERE IS AN ERROR, A POINT PASSED THROUGH HERE"

print w
slope_w = (-w[0])/w[1] 
intercept_w = (-w[2])/w[1]

为什么这个线性分类器算法错了？

2 个答案: