Question

我正在将单词转换为向量，我需要将向量作为类型为数组int的格式，但是我正在获取数组对象的类型。

有人可以帮助我解决问题吗？

def word2idx(statement):
        #here I am using sentencepieceprocessor as sp
        id1 = np.asarray(sp.encode_as_ids(statement)).astype(np.int32)
        return id1

sentence = 'the world', 'hello cherry', 'make me proud'
id2 = [word2idx(s)for s in sentence]
print(id2)

实际输出：

[[array([  34, 1867]), array([ 83, 184,  63,  50,  47,  71,  41]), array([328,  69,   7, 303, 649])]]

期望输出：

[[ 34, 1867], [ 83, 184,  63,  50,  47,  71,  41], [328,  69,   7, 303, 649]]

Answer 1

问题在于数组的长度不同，因此numpy无法从中生成张量。

如果您对列表列表感到满意并且不需要numpy数组，则可以执行以下操作：

id2 = np.array([[  34, 1867], [ 83, 184,  63,  50,  47,  71,  41]])
id2.tolist()

并获得：[[34, 1867], [83, 184, 63, 50, 47, 71, 41]]。

您需要一个密集的numpy数组，您需要将所有序列填充到相同的长度。您可以执行以下操作：

id2 = np.array([[  34, 1867], [ 83, 184,  63,  50,  47,  71,  41]])
idx = np.zeros((len(id2), max(len(s) for s in id2)))
for i, sent_ids in enumerate(id2):
    idx[i,:len(sent_ids)] = sent_ids

在这种情况下，您将获得：

array([[  34., 1867.,    0.,    0.,    0.,    0.,    0.],
       [  83.,  184.,   63.,   50.,   47.,   71.,   41.]])

如何获得句子功能的数组格式输出？

1 个答案: