DBSCAN聚类ValueError

时间:2018-01-30 15:28:15

标签: python windows dbscan

我正面临'Valueerror',而且从错误名称来看,它不是记忆错误。 我有一个来自doc2vec模型的稀疏数据,我将其插入到DBSCAN模型中,下面是我正在使用的代码

myModel = DBSCAN(eps=0.3,min_samples=100, algorithm='brute')
myModel.fit(doc2vec_output_vector)

当我执行此代码时,我总是低于错误。

ValueError                                Traceback (most recent call last)
<ipython-input-10-9a3b7f63ff7c> in <module>()
----> 1 a.fit(textVect)

c:\python27\lib\site-packages\sklearn\cluster\dbscan_.pyc in fit(self, X, y, sample_weight)
    282         X = check_array(X, accept_sparse='csr')
    283         clust = dbscan(X, sample_weight=sample_weight,
--> 284                        **self.get_params())
    285         self.core_sample_indices_, self.labels_ = clust
    286         if len(self.core_sample_indices_):

c:\python27\lib\site-packages\sklearn\cluster\dbscan_.pyc in dbscan(X, eps, min_samples, metric, metric_params, algorithm, leaf_size, p, sample_weight, n_jobs)
    143         # This has worst case O(n^2) memory complexity
    144         neighborhoods = neighbors_model.radius_neighbors(X, eps,
--> 145                                                          return_distance=False)
    146 
    147     if sample_weight is None:

c:\python27\lib\site-packages\sklearn\neighbors\base.pyc in radius_neighbors(self, X, radius, return_distance)
    588             if self.effective_metric_ == 'euclidean':
    589                 dist = pairwise_distances(X, self._fit_X, 'euclidean',
--> 590                                           n_jobs=self.n_jobs, squared=True)
    591                 radius *= radius
    592             else:

c:\python27\lib\site-packages\sklearn\metrics\pairwise.pyc in pairwise_distances(X, Y, metric, n_jobs, **kwds)
   1245         func = partial(distance.cdist, metric=metric, **kwds)
   1246 
-> 1247     return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
   1248 
   1249 

c:\python27\lib\site-packages\sklearn\metrics\pairwise.pyc in _parallel_pairwise(X, Y, func, n_jobs, **kwds)
   1088     if n_jobs == 1:
   1089         # Special case to avoid picklability checks in delayed
-> 1090         return func(X, Y, **kwds)
   1091 
   1092     # TODO: in some cases, backend='threading' may be appropriate

c:\python27\lib\site-packages\sklearn\metrics\pairwise.pyc in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared)
    244         YY = row_norms(Y, squared=True)[np.newaxis, :]
    245 
--> 246     distances = safe_sparse_dot(X, Y.T, dense_output=True)
    247     distances *= -2
    248     distances += XX

c:\python27\lib\site-packages\sklearn\utils\extmath.pyc in safe_sparse_dot(a, b, dense_output)
    138         return ret
    139     else:
--> 140         return np.dot(a, b)
    141 
    142 

ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

我的dbscan输入数据尺寸为300000 * 300。有人可以帮我吗?

0 个答案:

没有答案