我想分离数据并将其放入13个不同的变量集中,就像每个红色圆圈一样(请参见下图)。但是我不知道如何基于多元线性回归对数据进行聚类。知道如何在Python中执行此操作吗?
数据集: https://www.dropbox.com/s/ar5rzry0joe9ffu/dataset_v1.xlsx?dl=
我现在用于集群的代码:
print(__doc__)
import openpyxl
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
wb = openpyxl.load_workbook('dataset_v1.xlsx')
sheet = wb.worksheets[0]
ws = wb.active
row_count = sheet.max_row
data = np.zeros((row_count, 2))
index = 0
for r in ws.rows:
data[index,0] = r[0].value
data[index,1] = r[1].value
index += 1
# Compute DBSCAN
db = DBSCAN(eps=5, min_samples=0.1).fit(data)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
clusters = [data[labels == i] for i in range(n_clusters_)]
outliers = data[labels == -1]
# #############################################################################
# Plot result
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 0.5]
class_member_mask = (labels == k)
xy = data[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=14)
xy = data[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
答案 0 :(得分:0)
定义一个阈值,例如50。
只要y 增加超过50,就开始创建新的“集群”。 只要值减小,它们仍在先前的“群集”中。