LDA python库不以稀疏矩阵作为输入

时间:2015-07-26 21:37:48

标签: python sparse-matrix lda

我正在尝试使用lda 1.0.2 package for python

The documentation表示稀疏矩阵是可以接受的,但是当我将稀疏矩阵传递给transform()函数时。它抛出错误

  

具有多个元素的数组的真值是不明确的。   使用a.any()或a.all()。

transform()函数与普通矩阵一起工作正常。

还有其他人遇到类似的问题吗?

任何帮助都会很棒!在此先感谢:)

2 个答案:

答案 0 :(得分:0)

我得到了同样的错误。重现:

from scipy.sparse import csr_matrix
import lda

X = csr_matrix([[1,0],[0,1]])
lda_test = lda.LDA(n_topics=2, n_iter=10)
lda_test.fit(X)
X_trans = lda_test.transform(X)

产生错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-192-a1a0875bac02> in <module>()
      5 lda_test = lda.LDA(n_topics=2, n_iter=10)
      6 lda_test.fit(X)
----> 7 X_trans = lda_test.transform(X)

C:\Users\lidw6lw\PortablePython\App\lib\site-packages\lda\lda.pyc in transform(self, X, max_iter, tol)
    173         n_topics = len(self.components_)
    174         doc_topic = np.empty((len(X), n_topics))
--> 175         WS, DS = lda.utils.matrix_to_lists(X)
    176         # TODO: this loop is parallelizable
    177         for d in range(len(X)):

C:\Users\lidw6lw\PortablePython\App\lib\site-packages\lda\utils.pyc in matrix_to_lists(doc_word)
     44     if np.count_nonzero(doc_word.sum(axis=1)) != doc_word.shape[0]:
     45         logger.warning("all zero row in document-term matrix found")
---> 46     if np.count_nonzero(doc_word.sum(axis=0)) != doc_word.shape[1]:
     47         logger.warning("all zero column in document-term matrix found")
     48     sparse = True

C:\Users\lidw6lw\PortablePython\App\lib\site-packages\numpy\core\_methods.pyc in _sum(a, axis, dtype, out, keepdims)
     23 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
     24     return um.add.reduce(a, axis=axis, dtype=dtype,
---> 25                             out=out, keepdims=keepdims)
     26 
     27 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

C:\Users\lidw6lw\PortablePython\App\lib\site-packages\scipy\sparse\base.pyc in __bool__(self)
    181             return True if self.nnz == 1 else False
    182         else:
--> 183             raise ValueError("The truth value of an array with more than one "
    184                              "element is ambiguous. Use a.any() or a.all().")
    185     __nonzero__ = __bool__

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

看起来是由于lda.utils.matrix_to_lists

以下两项工作都很好:

X_trans = lda_test.fit(X.toarray())
X_trans2 = lda_test.fit_transform(X)

编辑:实际上,转换函数没有正确考虑稀疏矩阵。制作包的副本,并在transform的代码中将len(X)替换为X.shape(0)并注释掉np.atleast_2d(X)行。因此,transform中文档字符串正下方的部分如下所示:

# X = np.atleast_2d(X)
phi = self.components_
alpha = self.alpha
# for debugging, let's not worry about the documents
n_topics = len(self.components_)
doc_topic = np.empty((X.shape[0], n_topics))
WS, DS = lda.utils.matrix_to_lists(X)
# TODO: this loop is parallelizable
for d in range(X.shape[0]):

答案 1 :(得分:0)

最近遇到了类似的错误。 \

ValueError: expected sparse matrix with integer values, found float values

这解决了问题:

model.fit(X.toarray().astype(int))