term_map
跟踪哪个词位于哪个位置。In [256]: term_map = np.array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])
In [257]: term_map
Out[257]: array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])
term_scores
跟踪每个位置的每个术语的权重。In [258]: term_scores = np.array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])
In [259]: term_scores
Out[259]: array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])
In [260]: unqID, idx = np.unique(term_map, return_inverse=True)
In [261]: unqID
Out[261]: array([0, 2, 3, 4])
In [262]: value_sums = np.bincount(idx, term_scores)
In [263]: value_sums
Out[263]: array([ 4., 16., 9., 21.])
term_map
变量中的值。In [254]: vocab = np.zeros(13)
In [255]: vocab
Out[255]: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
vocab
变量。In [255]: updated_vocab
Out[255]: array([ 4., 0., 16., 9., 21., 0., 0., 0., 0., 0., 0., 0., 0.])
如何创建6?
答案 0 :(得分:3)
事实证明,我们可以避免np.unique
步骤通过将term_map
和term_scores
输入np.bincount
来直接获得所需的输出,并提及长度输出数组的可选参数minlength
。
因此,我们可以简单地做 -
final_output = np.bincount(term_map, term_scores, minlength=13)
示例运行 -
In [142]: term_map = np.array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])
...: term_scores = np.array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])
...:
In [143]: np.bincount(term_map, term_scores, minlength=13)
Out[143]:
array([ 4., 0., 16., 9., 21., 0., 0., 0., 0., 0., 0.,
0., 0.])
答案 1 :(得分:2)
import numpy as np
term_map = np.array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])
term_scores = np.array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])
unqID, idx = np.unique(term_map, return_inverse=True)
value_sums = np.bincount(idx, term_scores)
vocab = np.zeros(13)
vocab[unqID] = value_sums
print(vocab)
OUT: [ 4. 0. 16. 9. 21. 0. 0. 0. 0. 0. 0. 0. 0.]