python在使用Dataframe时数组中的索引太多了

时间:2017-07-20 12:20:02

标签: python

python程序做dunn索引来评估集群性能,学习相关程序已写在某个网站上,需要计算集群之间的最小距离和一个集群中的最大距离:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
...
def delta_fast(ck,cl,distances):
    values = distances[np.where(ck)][:,np.where(cl)]
    print(values)

def dunn_fast(points,labels):


    distances = euclidean_distances(points)
    print("distances")
    print(distances)
    print(distances.shape[0])
    print(distances.shape[1])

    ks = np.sort(np.unique(labels))
    print("ks")
    print(ks)


    deltas = np.ones([len(ks),len(ks)]) * 1000000

    big_deltas = np.zeros([len(ks),1])


    l_range = list(range(0,len(ks)))


    for k in l_range:
        for l in (l_range[0:k] + l_range[k+1:]):
            deltas[k,l] = delta_fast((labels == ks[k]),(labels == ks[l]),distances)

距离是数据帧(1406 * 1406) 但它错了:

Traceback (most recent call last):
  File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_dunnIndex.py", line 100, in <module>
    get_group_members_cluster_info(cluster_method,cluster_number)
  File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_dunnIndex.py", line 89, in get_group_members_cluster_info
    dunn_fast(cal_cluster_data_df,cluster_data_label_df)
  File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_dunnIndex.py", line 48, in dunn_fast
    deltas[k,l] = delta_fast((labels == ks[k]),(labels == ks[l]),distances)
  File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_dunnIndex.py", line 12, in delta_fast
    values = distances[np.where(ck)][:,np.where(cl)]
  IndexError: too many indices for array

似乎这句话是错误的:     values = distance [np.where(ck)] [:,np.where(cl)]

你可以告诉我原因以及如何解决它

0 个答案:

没有答案