我想实现一个接收单词列表并返回张量的函数
具有尺寸(#word,最长单词的长度,26)
这个想法是为每个单词创建一个(最长单词的长度,26)张量,其中每一行都用零填充,并用单个1表示那个位置的字母。例如单词“ abc” 将由以下张量表示:
tensor([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], dtype=torch.int32)
结果张量中的每个“行”(代表一个单词)应具有相同的大小。 所以我对每个单词使用零行填充。
例如,如果我们有单词[“ cd”,“ abc”]的输入列表。结果张量应为:
tensor([[[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]],
[[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]]] dtype=torch.int32)
我们假定单词仅由小写字母组成。
import numpy as np
import torch
def hot_one(words):
max_l = max([len(i) for i in words]) #get length of longest word
result = torch.empty((1,max_l, 26)).int() #create the resulting tensor
for word in words:
ints = (np.fromstring(word,dtype=np.uint8)-ord('a')) #create an array of latters value
addition = np.zeros((max_l - ints.shape[0],)) -1 #padding for words that are shorter
tr = torch.Tensor(np.expand_dims(np.hstack((ints,addition)),-1)) #create a tensor with the right dims
tr = (tr[:] == torch.arange(26)).int() #this line converts to values of 1,0
#result = torch.cat((result, tr)) #!!doesn't work!!#
print(result)
棘手的部分是不允许其在生成的张量上循环。 任何想法如何做到这一点?
编辑:仅允许使用numpy和Torch函数