Question

我想实现一个接收单词列表并返回张量的函数

具有尺寸（#word，最长单词的长度，26）

这个想法是为每个单词创建一个（最长单词的长度，26）张量，其中每一行都用零填充，并用单个1表示那个位置的字母。例如单词“ abc” 将由以下张量表示：

tensor([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0]], dtype=torch.int32)

结果张量中的每个“行”（代表一个单词）应具有相同的大小。所以我对每个单词使用零行填充。

例如，如果我们有单词[“ cd”，“ abc”]的输入列表。结果张量应为：

tensor([[[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0]],
        [[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0]]] dtype=torch.int32)

我们假定单词仅由小写字母组成。

import numpy as np
import torch

def hot_one(words):
    max_l = max([len(i) for i in words]) #get length of longest word
    result = torch.empty((1,max_l, 26)).int() #create the resulting tensor
    for word in words:
        ints = (np.fromstring(word,dtype=np.uint8)-ord('a')) #create an array of latters value
        addition = np.zeros((max_l - ints.shape[0],)) -1  #padding for words that are shorter
        tr = torch.Tensor(np.expand_dims(np.hstack((ints,addition)),-1)) #create a tensor with the right dims
        tr = (tr[:] == torch.arange(26)).int() #this line converts to values of 1,0
        #result = torch.cat((result, tr)) #!!doesn't work!!#
    print(result)

棘手的部分是不允许其在生成的张量上循环。任何想法如何做到这一点？

编辑：仅允许使用numpy和Torch函数

在字符级别对单词进行一次热编码的列表，而不会循环生成的张量

0 个答案: