Question

我正在Django应用中使用Kprototype算法创建聚类算法。

如今，我正在用错误数据测试我的所有算法，以了解其工作原理并验证其工作原理。

我的聚类和预测函数是：

def ClusterCreation(request,*args):
    global kproto
    # random categorical data
    data = np.array([
            [0,'a',4],
            [1,'e',3],
            [3,'ffe',7],
            [5,'fdfd',16]
            ])

    kproto = KPrototypes(n_clusters=2, init='Cao', verbose=2)
    clusters = kproto.fit_predict(data, categorical=[1,2])

    # Create CSV with cluster statistics
    clusterStatisticsCSV(kproto)
    for argument in args:
        if argument is not None:
            return

    # Print the cluster centroids
    return HttpResponse('Clustering ok')

def ClusterPrediction(request):

    global kproto

    if (kproto==0):
        ClusterCreation(None,1)

    # random point to fit
    data = np.array([0,'a',4])
    fit_label = kproto.predict(data, categorical=[0,1]) #categorical is the Index of columns that contain categorical data

    # Print the cluster centroids
    return HttpResponse('Point '+str(data)+' is in cluster '+str(fit_label))

我可以毫无问题地运行ClusterCreation函数，但是现在我添加了该功能来预测新数据点的集群。

您将看到一个名为clusterStatisticsCSV的函数，它可以正常工作，并且是简单的CSV导出。

我收到以下错误日志：

Initialization method and algorithm are deterministic. Setting n_init to 1.
dz01     | Init: initializing centroids
dz01     | Init: initializing clusters
dz01     | Starting iterations...
dz01     | Run: 1, iteration: 1/100, moves: 0, ncost: 8.50723954060097
dz01     | Internal Server Error: /cluster/clusterPrediction/
dz01     | Traceback (most recent call last):
dz01     |   File "/usr/local/lib/python3.5/site-packages/django/core/handlers/exception.py", line 35, in inner
dz01     |     response = get_response(request)
dz01     |   File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 128, in _get_response
dz01     |     response = self.process_exception_by_middleware(e, request)
dz01     |   File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 126, in _get_response
dz01     |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
dz01     |   File "/src/cluster/views.py", line 62, in ClusterPrediction
dz01     |     fit_label = kproto.predict(data, categorical=[0,1]) #categorical is the Index of columns that contain categorical data
dz01     |   File "/usr/local/lib/python3.5/site-packages/kmodes/kprototypes.py", line 438, in predict
dz01     |     Xnum, Xcat = _split_num_cat(X, categorical)
dz01     |   File "/usr/local/lib/python3.5/site-packages/kmodes/kprototypes.py", line 44, in _split_num_cat
dz01     |     Xnum = np.asanyarray(X[:, [ii for ii in range(X.shape[1])
dz01     | IndexError: tuple index out of range

我知道哪个是错误，我想这与以下方面有关： kproto.predict(data, categorical=[0,1])。具体来说，用分类列索引。尽管应用更改来测试另一个值并获得解决方案，但我仍无法完全理解会发生什么并解决它。

我的担心也与ClusterCreation函数中的相同分类参数有关，因为可能也是错误的，然后簇是错误的。

我想念什么？

Answer 1

解决了！

数据数组有一个错误： 上一个： data = np.array([0, 'a', 3]) 更正： data = np.array([[0, 'a', 3]])

尽管我读取了所有Kprototype.py文件，但已经看到Categorical是一个参数，该参数指示多数组中具有分类数据的每个变量的索引。因此，如果您说categorical=[1,2]，则是说第二和第三列（python索引以0开头）是分类变量。

为此的示例多数组是：

data = np.array([
            [0,'a','rete'],
            [1,'e','asd'],
            ])

Kprototype算法元组索引超出范围

1 个答案: