我有一个数字列表及其len(lex) = 6064
,看起来像这样
[0,
0,
1,
0,
0,
-1,
1,
1,
0,
0,
0,
0,
1,
0,]
和csr矩阵
tweets.shape = (6064, 2500)
如何合并它们我尝试将它们转换为两个列表,但是当我尝试处理它时出现错误
tweets = list(tweets)
lex = list(lex)
tweets_final = np.column_stack([tweets, lex])
在我分割训练数据后,我得到以下错误
nb.fit(X_train, y_train)
ValueError: setting an array element with a sequence.
如何将该列表添加为该矩阵的列
答案 0 :(得分:4)
您可以使用scipy.sparse.hstack
水平堆叠这两个(列式)。我们只需要将列表转换为列向量(以稀疏矩阵表示)或具有单列的2D数组 -
scipy.sparse.hstack(( tweets, csr_matrix(lex).T ))
scipy.sparse.hstack(( tweets, np.asarray(lex)[:,None] ))
示例运行 -
In [189]: from scipy.sparse import csr_matrix
In [194]: import scipy as sp
In [190]: a = np.random.randint(0,4,(5,10))
In [192]: a
Out[192]:
array([[2, 1, 1, 1, 0, 3, 1, 3, 2, 1],
[0, 2, 1, 2, 3, 0, 1, 1, 2, 3],
[0, 1, 1, 1, 2, 3, 0, 1, 0, 1],
[0, 0, 3, 0, 3, 0, 1, 0, 3, 1],
[1, 0, 2, 3, 3, 3, 2, 2, 0, 1]])
In [193]: b = [9,8,7,6,5] # equivalent to lex
In [191]: A = csr_matrix(a) # equivalent to tweets
In [195]: sp.sparse.hstack(( A, csr_matrix(b).T ))
Out[195]:
<5x11 sparse matrix of type '<type 'numpy.int64'>'
with 42 stored elements in COOrdinate format>
In [197]: _.toarray() # verify values by converting to dense array
Out[197]:
array([[2, 1, 1, 1, 0, 3, 1, 3, 2, 1, 9],
[0, 2, 1, 2, 3, 0, 1, 1, 2, 3, 8],
[0, 1, 1, 1, 2, 3, 0, 1, 0, 1, 7],
[0, 0, 3, 0, 3, 0, 1, 0, 3, 1, 6],
[1, 0, 2, 3, 3, 3, 2, 2, 0, 1, 5]])