最佳学习算法同心且不可线性分离数据

时间:2018-04-19 00:50:07

标签: algorithm machine-learning

下面是两个散点图。第一个是针对具有x和y值的数据点,我想知道是否存在将自动识别出存在两个集群的聚类算法。它们是同心的而不是线性可分的。由于几个原因,K-means是不对的。另一个图是相似的,但它有x,y和颜色值,我想知道哪种学习算法最适合从x和y的值中分类或预测正确的颜色。enter image description here

1 个答案:

答案 0 :(得分:0)

我使用sklearn MLPClassifier算法得到了很好的分类器结果。这是散点图和等高线图:

enter image description here

详细代码:https://www.linkedin.com/pulse/couple-scikit-learn-classifiers-peter-thorsteinson。下面的简化代码显示了它的工作原理:

import math
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix  

# Generate the artificial data set and display the resulting scatter plot 

x = []
y = []
z = []
for i in range(500):
    rand = np.random.uniform(0.0, 2*math.pi)
    randx = np.random.normal(0.0, 30.0)
    randy = np.random.normal(0.0, 30.0)
    if np.random.random() > 0.5:
        z.append(0)
        x.append(100*math.cos(rand) + randx)
        y.append(100*math.sin(rand) + randy)
    else:
        z.append(1)
        x.append(300*math.cos(rand) + randx)
        y.append(300*math.sin(rand) + randy)

plt.axis('equal')
plt.axis([-500, 500, -500, 500])
plt.scatter(x, y, c=z)
plt.show()

# Run the MLPClassifier algorithm on the training data

XY = pd.DataFrame({'x': x, 'y': y})
print(XY.head())
Z = pd.DataFrame({'z': z})
print(Z.head())
XY_train, XY_test, Z_train, Z_test = train_test_split(XY, Z, test_size = 0.20)
mlp = MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)
mlp.fit(XY_train, Z_train.values.ravel())

# Make predictions on the test data and display resulting scatter plot

predictions = mlp.predict(XY_test)

print(confusion_matrix(Z_test,predictions))  
print(classification_report(Z_test,predictions))

plt.axis('equal')
plt.axis([-500, 500, -500, 500])
plt.scatter(XY_test.x, XY_test.y, c=predictions)
plt.show()