我正在尝试使用lda 1.0.2 package for python。
The documentation表示稀疏矩阵是可以接受的,但是当我将稀疏矩阵传递给transform()
函数时。它抛出错误
具有多个元素的数组的真值是不明确的。 使用a.any()或a.all()。
transform()
函数与普通矩阵一起工作正常。
还有其他人遇到类似的问题吗?
任何帮助都会很棒!在此先感谢:)
答案 0 :(得分:0)
我得到了同样的错误。重现:
from scipy.sparse import csr_matrix
import lda
X = csr_matrix([[1,0],[0,1]])
lda_test = lda.LDA(n_topics=2, n_iter=10)
lda_test.fit(X)
X_trans = lda_test.transform(X)
产生错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-a1a0875bac02> in <module>()
5 lda_test = lda.LDA(n_topics=2, n_iter=10)
6 lda_test.fit(X)
----> 7 X_trans = lda_test.transform(X)
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\lda\lda.pyc in transform(self, X, max_iter, tol)
173 n_topics = len(self.components_)
174 doc_topic = np.empty((len(X), n_topics))
--> 175 WS, DS = lda.utils.matrix_to_lists(X)
176 # TODO: this loop is parallelizable
177 for d in range(len(X)):
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\lda\utils.pyc in matrix_to_lists(doc_word)
44 if np.count_nonzero(doc_word.sum(axis=1)) != doc_word.shape[0]:
45 logger.warning("all zero row in document-term matrix found")
---> 46 if np.count_nonzero(doc_word.sum(axis=0)) != doc_word.shape[1]:
47 logger.warning("all zero column in document-term matrix found")
48 sparse = True
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\numpy\core\_methods.pyc in _sum(a, axis, dtype, out, keepdims)
23 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
24 return um.add.reduce(a, axis=axis, dtype=dtype,
---> 25 out=out, keepdims=keepdims)
26
27 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\scipy\sparse\base.pyc in __bool__(self)
181 return True if self.nnz == 1 else False
182 else:
--> 183 raise ValueError("The truth value of an array with more than one "
184 "element is ambiguous. Use a.any() or a.all().")
185 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
看起来是由于lda.utils.matrix_to_lists
以下两项工作都很好:
X_trans = lda_test.fit(X.toarray())
X_trans2 = lda_test.fit_transform(X)
编辑:实际上,转换函数没有正确考虑稀疏矩阵。制作包的副本,并在transform
的代码中将len(X)
替换为X.shape(0)
并注释掉np.atleast_2d(X)
行。因此,transform
中文档字符串正下方的部分如下所示:
# X = np.atleast_2d(X)
phi = self.components_
alpha = self.alpha
# for debugging, let's not worry about the documents
n_topics = len(self.components_)
doc_topic = np.empty((X.shape[0], n_topics))
WS, DS = lda.utils.matrix_to_lists(X)
# TODO: this loop is parallelizable
for d in range(X.shape[0]):
答案 1 :(得分:0)
最近遇到了类似的错误。 \
ValueError: expected sparse matrix with integer values, found float values
这解决了问题:
model.fit(X.toarray().astype(int))