Python中基于多重线性回归的数据聚类

时间:2018-12-08 12:45:13

标签: python cluster-analysis linear-regression

我想分离数据并将其放入13个不同的变量集中,就像每个红色圆圈一样(请参见下图)。但是我不知道如何基于多元线性回归对数据进行聚类。知道如何在Python中执行此操作吗?

I don't read what I write

数据集: https://www.dropbox.com/s/ar5rzry0joe9ffu/dataset_v1.xlsx?dl=

我现在用于集群的代码:

print(__doc__)

import openpyxl
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt


wb = openpyxl.load_workbook('dataset_v1.xlsx')
sheet = wb.worksheets[0]
ws = wb.active
row_count = sheet.max_row
data = np.zeros((row_count, 2))

index = 0
for r in ws.rows:
    data[index,0] = r[0].value
    data[index,1] = r[1].value
    index += 1


# Compute DBSCAN
db = DBSCAN(eps=5, min_samples=0.1).fit(data)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)

clusters = [data[labels == i] for i in range(n_clusters_)]
outliers = data[labels == -1]

# #############################################################################
# Plot result

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
          for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = [0, 0, 0, 0.5]

    class_member_mask = (labels == k)

    xy = data[class_member_mask & core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
             markeredgecolor='k', markersize=14)

    xy = data[class_member_mask & ~core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
             markeredgecolor='k', markersize=6)

plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

1 个答案:

答案 0 :(得分:0)

定义一个阈值,例如50。

只要y 增加超过50,就开始创建新的“集群”。 只要值减小,它们仍在先前的“群集”中。