Question

我正在尝试将ExtraTreesClassifier用于稀疏数据，按照the documentation，但我确实需要运行时间TypeError来询问密集数据。这是scikit-learn 0.17.1，下面我引用文档：

Parameters: X : array-like or sparse matrix of shape = [n_samples, n_features]

代码非常简单：

import pandas as pd
from scipy.sparse import coo_matrix, csr_matrix, hstack
from sklearn.ensemble import ExtraTreesClassifier
import numpy as np
from scipy import *

features = array([[1, 0], [0, 1], [3, 4]])
sparse_features = csr_matrix(features)
labels = array([0, 1, 0])

classifier = ExtraTreesClassifier()
classifier.fit(sparse_features, labels)

这里有例外：TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.。传入features时，此方法正常。

看起来文档已过期或上述代码有问题吗？

任何帮助将不胜感激。谢谢。

Answer 1

引用文档：

在内部，如果为稀疏的csc_matrix提供稀疏矩阵，它将被转换为dtype = np.float32。

所以我希望通过csc_matrix会有所帮助。

在我的设置中，两个版本都正常工作（csc和csr，sklearn 0.17.1），我认为问题可能出现在scipy的旧版本上。

具有稀疏训练数据的ExtraTreesClassifier？

1 个答案: