如何合并列表和csr矩阵

时间:2017-09-04 18:14:24

标签: python numpy matrix scipy

我有一个数字列表及其len(lex) = 6064,看起来像这样

[0,
 0,
 1,
 0,
 0,
 -1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,]

和csr矩阵

tweets.shape = (6064, 2500)

如何合并它们我尝试将它们转换为两个列表,但是当我尝试处理它时出现错误

tweets = list(tweets)
lex = list(lex)
tweets_final = np.column_stack([tweets, lex])

在我分割训练数据后,我得到以下错误

nb.fit(X_train, y_train)


ValueError: setting an array element with a sequence.

如何将该列表添加为该矩阵的列

1 个答案:

答案 0 :(得分:4)

您可以使用scipy.sparse.hstack水平堆叠这两个(列式)。我们只需要将列表转换为列向量(以稀疏矩阵表示)或具有单列的2D数组 -

scipy.sparse.hstack(( tweets, csr_matrix(lex).T ))

scipy.sparse.hstack(( tweets, np.asarray(lex)[:,None] ))

示例运行 -

In [189]: from scipy.sparse import csr_matrix

In [194]: import scipy as sp

In [190]: a = np.random.randint(0,4,(5,10))

In [192]: a
Out[192]: 
array([[2, 1, 1, 1, 0, 3, 1, 3, 2, 1],
       [0, 2, 1, 2, 3, 0, 1, 1, 2, 3],
       [0, 1, 1, 1, 2, 3, 0, 1, 0, 1],
       [0, 0, 3, 0, 3, 0, 1, 0, 3, 1],
       [1, 0, 2, 3, 3, 3, 2, 2, 0, 1]])

In [193]: b = [9,8,7,6,5]  # equivalent to lex

In [191]: A = csr_matrix(a)  # equivalent to tweets

In [195]: sp.sparse.hstack(( A, csr_matrix(b).T ))
Out[195]: 
<5x11 sparse matrix of type '<type 'numpy.int64'>'
    with 42 stored elements in COOrdinate format>

In [197]: _.toarray() # verify values by converting to dense array
Out[197]: 
array([[2, 1, 1, 1, 0, 3, 1, 3, 2, 1, 9],
       [0, 2, 1, 2, 3, 0, 1, 1, 2, 3, 8],
       [0, 1, 1, 1, 2, 3, 0, 1, 0, 1, 7],
       [0, 0, 3, 0, 3, 0, 1, 0, 3, 1, 6],
       [1, 0, 2, 3, 3, 3, 2, 2, 0, 1, 5]])