我正面临'Valueerror',而且从错误名称来看,它不是记忆错误。 我有一个来自doc2vec模型的稀疏数据,我将其插入到DBSCAN模型中,下面是我正在使用的代码
myModel = DBSCAN(eps=0.3,min_samples=100, algorithm='brute')
myModel.fit(doc2vec_output_vector)
当我执行此代码时,我总是低于错误。
ValueError Traceback (most recent call last)
<ipython-input-10-9a3b7f63ff7c> in <module>()
----> 1 a.fit(textVect)
c:\python27\lib\site-packages\sklearn\cluster\dbscan_.pyc in fit(self, X, y, sample_weight)
282 X = check_array(X, accept_sparse='csr')
283 clust = dbscan(X, sample_weight=sample_weight,
--> 284 **self.get_params())
285 self.core_sample_indices_, self.labels_ = clust
286 if len(self.core_sample_indices_):
c:\python27\lib\site-packages\sklearn\cluster\dbscan_.pyc in dbscan(X, eps, min_samples, metric, metric_params, algorithm, leaf_size, p, sample_weight, n_jobs)
143 # This has worst case O(n^2) memory complexity
144 neighborhoods = neighbors_model.radius_neighbors(X, eps,
--> 145 return_distance=False)
146
147 if sample_weight is None:
c:\python27\lib\site-packages\sklearn\neighbors\base.pyc in radius_neighbors(self, X, radius, return_distance)
588 if self.effective_metric_ == 'euclidean':
589 dist = pairwise_distances(X, self._fit_X, 'euclidean',
--> 590 n_jobs=self.n_jobs, squared=True)
591 radius *= radius
592 else:
c:\python27\lib\site-packages\sklearn\metrics\pairwise.pyc in pairwise_distances(X, Y, metric, n_jobs, **kwds)
1245 func = partial(distance.cdist, metric=metric, **kwds)
1246
-> 1247 return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
1248
1249
c:\python27\lib\site-packages\sklearn\metrics\pairwise.pyc in _parallel_pairwise(X, Y, func, n_jobs, **kwds)
1088 if n_jobs == 1:
1089 # Special case to avoid picklability checks in delayed
-> 1090 return func(X, Y, **kwds)
1091
1092 # TODO: in some cases, backend='threading' may be appropriate
c:\python27\lib\site-packages\sklearn\metrics\pairwise.pyc in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared)
244 YY = row_norms(Y, squared=True)[np.newaxis, :]
245
--> 246 distances = safe_sparse_dot(X, Y.T, dense_output=True)
247 distances *= -2
248 distances += XX
c:\python27\lib\site-packages\sklearn\utils\extmath.pyc in safe_sparse_dot(a, b, dense_output)
138 return ret
139 else:
--> 140 return np.dot(a, b)
141
142
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
我的dbscan输入数据尺寸为300000 * 300。有人可以帮我吗?