Numpy说有NaN / Inf值,没有找到

时间:2017-03-20 18:24:31

标签: python numpy scikit-learn sparse-matrix

我有一个sparse.csr_matrix。它由三个连接的矩阵组成,其中一个最初是csr,另外两个是从密集矩阵转换而来。

在数据的SUBSET上运行sklearn.ensemble.RandomForestClassifier时(但不是全部),我收到错误:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

但是:检查,我发现:

np.isnan(matrix.data).any()    # => False (there are no NaNs)
np.isfinite(matrix.data).all() # => True  (There are no infinite values)
np.max(matrix.data)            # => 10499 (certainly not too big for floats)

对于完整数据和子集,表明错误不正确,问题出在其他地方 - 但在哪里,为什么,我仍然无法分辨。有没有人见过这个?

编辑:

图表1:repr(matrix) = "<12785x190428 sparse matrix of type '<type 'numpy.float64'>'\n\twith 2825051 stored elements in Compressed Sparse Row format>"

图表2:错误堆栈

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-649153b97fe0> in <module>()
      8     lower = upper
      9 
---> 10     m = rf.fit(everything[train,:], data.label[train])
     11     yhat = m.predict(everything[test,:])
     12     print(np.mean(yhat==data.label[test]))

/usr/local/lib/python2.7/site-packages/sklearn/ensemble/forest.pyc in fit(self, X, y, sample_weight)
    246         # Validate or convert input data
    247         X = check_array(X, accept_sparse="csc", dtype=DTYPE)
--> 248         y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
    249         if issparse(X):
    250             # Pre-sort indices to avoid that each individual tree of the

/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    405                              % (array.ndim, estimator_name))
    406         if force_all_finite:
--> 407             _assert_all_finite(array)
    408 
    409     shape_repr = _shape_repr(array.shape)

/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.pyc in _assert_all_finite(X)
     56             and not np.isfinite(X).all()):
     57         raise ValueError("Input contains NaN, infinity"
---> 58                          " or a value too large for %r." % X.dtype)
     59 
     60 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

0 个答案:

没有答案