Question

在2D数据集中检测曲线的最聪明方法是什么？必须通过定义到邻居的最大距离来对数据点进行聚类。我的目标是在每条曲线上应用polyfit函数，并将此模板用于相同的数据集。

数据示例：

array（[[0.，0.，0.，...，2020.，2020.，2020.]， [51.，76.，194.，...，1862.，1915.，2021。]]]

弄清楚这可以通过聚集聚类来完成，这是代码和结果：

from sklearn.cluster import AgglomerativeClustering

#Reshape data

a = array[:, 0].flatten()
b = array[:, 1].flatten()

array_new = np.matrix([a,b])
array_new = np.squeeze(np.asarray(array_new))

array_new1 = array_new.T

#Clustering algorithm

n_clusters = None
model = AgglomerativeClustering(n_clusters=n_clusters,
                                affinity='euclidean', 
                                linkage='single',
                                compute_full_tree=True,
                                distance_threshold=15) 
model.fit(array_new1)
labels = model.labels_
n_clusters = len(list(set(labels)))
print(n_clusters)

cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, n_clusters)]

plt.figure(figsize=(15,15))
for i, color in enumerate(colors, start=1):
    plt.scatter(array_new1[labels==i,0], array_new1[labels==i,1], color=color)
plt.gca().invert_yaxis()
plt.show()

![](https://i.stack.imgur.com/utwqP.png)

#plotting result

data = pd.DataFrame({'x' : array_new1[:,0],
                     'y' : array_new1[:,1],
                     'label' : labels})

data.sort_values(by='label')

counter = 0
plt.figure(figsize=(15,15))
plt.scatter(5*array[:, 0], array[:, 1])
for i in range(n_clusters):

    if len(data.loc[data['label'] == i].iloc[:,0]) > 50 \
    and len(data.loc[data['label'] == i].iloc[:,0]) < 1000:

            counter += 1

            z = np.polyfit(data.loc[data['label'] == i].iloc[:,0], 
                            data.loc[data['label'] == i].iloc[:,1],
                              2)

            p = np.poly1d(z)
            xp = np.linspace(0, tasku_sk, 50)

            #plt.scatter(data.loc[data['label'] == i].iloc[:,0], 
            #            data.loc[data['label'] == i].iloc[:,1])
            plt.plot(5*xp, p(xp), c='r', lw=4)

plt.gca().invert_yaxis()
plt.show()

print(counter)

![](https://i.stack.imgur.com/AQHOf.png)

22

Answer 1

是的

所有聚类算法中据认为最古老的算法：单链接。

如何检测2D阵列中的曲线形簇？蟒蛇

1 个答案: