Question

我具有此功能，该功能是我为K-Means聚类算法创建的文件的一部分。

def assign_to_cluster_mean_centroid(x_in=x, centroids_in=centroids, n_user=n):
'''This function calculates the euclidean distance between each data point and
a cluster centroid. It then allocates each data point to the centroid that it is the
closest to in distance.'''
    distances_arr_re = np.reshape(distance_between(
        centroids_in, x_in[0]), (len(centroids_in), len(x_in[0])))
    datapoint_cen = []
    distances_min = []  # Done if needed
    for value in zip(*distances_arr_re):
        distances_min.append(min(value))
        datapoint_cen.append(np.argmin(value)+1)

    clusters = {}
    for no_user in range(0, n_user):
        clusters[no_user+1] = []

    for d_point, cent in zip(x_in[0], datapoint_cen):
        clusters[cent].append(d_point)

    # Run a for loop and rewrite the centroids
    # with the newly calculated means
    for i, cluster in enumerate(clusters):
        reshaped = np.reshape(clusters[cluster], (len(clusters[cluster]), 2))
        centroids[i][0] = sum(reshaped[0:, 0])/len(reshaped[0:, 0])
        centroids[i][1] = sum(reshaped[0:, 1])/len(reshaped[0:, 1])
    print('Centroids for this iteration are:' + str(centroids))
return datapoint_cen, clusters

此函数返回两个值，一个列表（datapoint_cen）包含从每个数据点到其最近的质心的距离得出的所有标签，以及一个字典（簇），该字典包含每个簇以及为每个簇分配数据点的簇集群。

然后我有一个主循环，并按以下方式两次调用此函数：

# Create the dataframe for vizualisation
cluster_data = pd.DataFrame({'Birth Rate': x[0][0:, 0],
                             'Life Expectancy': x[0][0:, 1],
                             'label': assign_to_cluster_mean_centroid()[0],
                             'Country': x[1]})

还有

mean = assign_to_cluster_mean_centroid()[1]

我的问题是，在第二次调用函数时将其分配给变量 “均值”函数将重新计算所有内容，并为集群返回一组新的值。为了使我的算法准确，我需要在函数的第二次调用时提取函数的第一次调用的簇。任何帮助将不胜感激。

Answer 1

是否可以声明一个调用该函数的变量？

例如

assigning = assign_to_cluster_mean_centroid()

然后再使用切片？例如

cluster_data = pd.DataFrame({'Birth Rate': x[0][0:, 0],
                             'Life Expectancy': x[0][0:, 1],
                             'label': assigning[0],
                             'Country': x[1]})

后来：

mean = assigning[1]

我目前不知道我们当前处于什么循环中，因此我不确定100％是否会遇到示波器问题。

或者，您可以使用多个分配来打开包装。

例如

label, mean = assign_to_cluster_mean_centroid()

这意味着第二部分已经完成，您只需要：

cluster_data = pd.DataFrame({'Birth Rate': x[0][0:, 0],
                             'Life Expectancy': x[0][0:, 1],
                             'label': label,
                             'Country': x[1]})

希望这有帮助吗？

在for循环中调用两次调用的函数的第一个第二个返回值

1 个答案: