在处理一些文本数据时,我正在尝试将一个np数组(从一个pandas系列)连接到一个csr矩阵。
我已完成以下操作。
#create a compatible sparse matrix from my np.array.
#sparse.csr_matrix(X['link'].values) returns array size (1,7395)
#transpose that array for (7395,1)
X = sparse.csr_matrix(X['link'].values.transpose)
#bodies is a sparse.csr_matrix with shape (7395, 20000)
bodies = sparse.hstack((bodies,X))
但是,此行会显示错误no supported conversion for types: (dtype('O'),)
。我不确定这意味着什么?我该如何解决它?
感谢。
答案 0 :(得分:2)
这是Saullo Castro的评论作为答案:
x = np.arange(12).reshape(1,12) # ndarray
sparse.csr_matrix(x)
Out[14]: <1x12 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>
x.transpose # function, not ndarray
Out[15]: <function transpose>
X = sparse.csr_matrix(x.transpose)
TypeError: no supported conversion for types: (dtype('O'),)
在使用hstack
之前发生错误,尝试从函数而不是ndarray创建稀疏矩阵。该错误忽略了()
。
# x.transpose() == x.T # ndarray
sparse.csr_matrix(x.transpose())
Out[17]: <12x1 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>
sparse.csr_matrix(x.T)
Out[18]: <12x1 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>
bodies = sparse.rand(12,3,format='csr',density=.1)
sparse.hstack((bodies,X))
Out[32]: <12x4 sparse matrix of type '<type 'numpy.float64'>'
with 14 stored elements in COOrdinate format>
csr_matrix
如果给出转置数组,则可以正常工作。
答案 1 :(得分:0)
import numpy as np
import pandas as pd
from scipy import sparse
d = {
"a": 30,
"b": 20,
"c": 10
}
s = pd.Series(d, index=["c", "b", "a"])
print s
--output:--
c 10
b 20
a 30
dtype: int64
my_ndarray = s.values
print my_ndarray
--output:--
[10 20 30]
X = sparse.csr_matrix(my_ndarray).transpose()
print X.todense()
--output:--
[[10]
[20]
[30]]
bodies = sparse.csr_matrix([
[0, 1],
[1, 0],
[0, 0]
])
print bodies.todense()
--output:--
[[0 1]
[1 0]
[0 0]]
result = sparse.hstack((bodies,X))
print result.todense()
--output:--
[[ 0 1 10]
[ 1 0 20]
[ 0 0 30]]
写作:
X = sparse.csr_matrix(my_ndarray.transpose())
产生错误:
Traceback (most recent call last):
File "1.py", line 33, in <module>
result = sparse.hstack((bodies,X))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/construct.py", line 417, in hstack
return bmat([blocks], format=format, dtype=dtype)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/construct.py", line 515, in bmat
raise ValueError('blocks[%d,:] has incompatible row dimensions' % i)
ValueError: blocks[0,:] has incompatible row dimensions
比较:
import numpy as np
import pandas as pd
from scipy import sparse
d = {
"a": "hello",
"b": "world",
"c": "goodbye"
}
s = pd.Series(d, index=["c", "b", "a"])
print s
--output:--
c goodbye
b world
a hello
my_ndarray = s.values
print my_ndarray
--output:--
[goodbye world hello]
X = sparse.csr_matrix(s.values).transpose()
--output:--
Traceback (most recent call last):
File "1.py", line 19, in <module>
X = sparse.csr_matrix(s.values).transpose()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 66, in __init__
self._set_self( self.__class__(coo_matrix(arg1, dtype=dtype)) )
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 30, in __init__
arg1 = arg1.asformat(self.format)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/base.py", line 203, in asformat
return getattr(self,'to' + format)()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/coo.py", line 312, in tocsr
data = np.empty(self.nnz, dtype=upcast(self.dtype))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/sputils.py", line 53, in upcast
raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('object'),)
您自己没有提供这样的示例意味着您没有投入足够的工作来调试问题。