Kprototype算法元组索引超出范围

时间:2018-07-27 21:56:24

标签: python algorithm

我正在Django应用中使用Kprototype算法创建聚类算法。

如今,我正在用错误数据测试我的所有算法,以了解其工作原理并验证其工作原理。

我的聚类和预测函数是:

def ClusterCreation(request,*args):
    global kproto
    # random categorical data
    data = np.array([
            [0,'a',4],
            [1,'e',3],
            [3,'ffe',7],
            [5,'fdfd',16]
            ])

    kproto = KPrototypes(n_clusters=2, init='Cao', verbose=2)
    clusters = kproto.fit_predict(data, categorical=[1,2])

    # Create CSV with cluster statistics
    clusterStatisticsCSV(kproto)
    for argument in args:
        if argument is not None:
            return

    # Print the cluster centroids
    return HttpResponse('Clustering ok')

def ClusterPrediction(request):

    global kproto

    if (kproto==0):
        ClusterCreation(None,1)

    # random point to fit
    data = np.array([0,'a',4])
    fit_label = kproto.predict(data, categorical=[0,1]) #categorical is the Index of columns that contain categorical data

    # Print the cluster centroids
    return HttpResponse('Point '+str(data)+' is in cluster '+str(fit_label))

我可以毫无问题地运行ClusterCreation函数,但是现在我添加了该功能来预测新数据点的集群。

您将看到一个名为clusterStatisticsCSV的函数,它可以正常工作,并且是简单的CSV导出。

我收到以下错误日志:

Initialization method and algorithm are deterministic. Setting n_init to 1.
dz01     | Init: initializing centroids
dz01     | Init: initializing clusters
dz01     | Starting iterations...
dz01     | Run: 1, iteration: 1/100, moves: 0, ncost: 8.50723954060097
dz01     | Internal Server Error: /cluster/clusterPrediction/
dz01     | Traceback (most recent call last):
dz01     |   File "/usr/local/lib/python3.5/site-packages/django/core/handlers/exception.py", line 35, in inner
dz01     |     response = get_response(request)
dz01     |   File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 128, in _get_response
dz01     |     response = self.process_exception_by_middleware(e, request)
dz01     |   File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 126, in _get_response
dz01     |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
dz01     |   File "/src/cluster/views.py", line 62, in ClusterPrediction
dz01     |     fit_label = kproto.predict(data, categorical=[0,1]) #categorical is the Index of columns that contain categorical data
dz01     |   File "/usr/local/lib/python3.5/site-packages/kmodes/kprototypes.py", line 438, in predict
dz01     |     Xnum, Xcat = _split_num_cat(X, categorical)
dz01     |   File "/usr/local/lib/python3.5/site-packages/kmodes/kprototypes.py", line 44, in _split_num_cat
dz01     |     Xnum = np.asanyarray(X[:, [ii for ii in range(X.shape[1])
dz01     | IndexError: tuple index out of range

我知道哪个是错误,我想这与以下方面有关: kproto.predict(data, categorical=[0,1])。具体来说,用分类列索引。尽管应用更改来测试另一个值并获得解决方案,但我仍无法完全理解会发生什么并解决它。

我的担心也与ClusterCreation函数中的相同分类参数有关,因为可能也是错误的,然后簇是错误的。

我想念什么?

1 个答案:

答案 0 :(得分:0)

解决了!

数据数组有一个错误: 上一个: data = np.array([0, 'a', 3]) 更正data = np.array([[0, 'a', 3]])

尽管我读取了所有Kprototype.py文件,但已经看到Categorical是一个参数,该参数指示多数组中具有分类数据的每个变量的索引。因此,如果您说categorical=[1,2],则是说第二和第三列(python索引以0开头)是分类变量。

为此的示例多数组是:

data = np.array([
            [0,'a','rete'],
            [1,'e','asd'],
            ])