Question

我有一本字典，其中的键是单词，值是这些单词的向量。我有一个要转换为数组的句子列表。我得到了所有单词的数组，但是我希望有一个带有单词向量的句子数组，以便可以将其输入到神经网络中

sentences=["For last 8 years life, Galileo house arrest espousing man's theory",
           'No. 2: 1912 Olympian; football star Carlisle Indian School; 6 MLB seasons Reds, Giants & Braves',
           'The city Yuma state record average 4,055 hours sunshine year'.......]    

word_vec={'For': [0.27452874183654785, 0.8040047883987427],
         'last': [-0.6316165924072266, -0.2768899202346802],
         'years': [-0.2496756911277771, 1.243837594985962],
         'life,': [-0.9836481809616089, -0.9561406373977661].....}

我想将上述句子转换成字典中相应单词的向量。

Answer 1

尝试一下：

def sentence_to_list(sentence, words_dict):
    return [w for w in sentence.split() if w in words_dict]

因此，示例中的第一句话将转换为：

['For', 'last', 'years', 'life']  # words not in the dictionary are not present here

更新。

我想您需要删除标点符号。有几种方法可以使用多个定界符分割字符串，请检查以下答案：Split Strings into words with multiple word boundary delimiters

Answer 2

这将创建protected override void OnElementChanged(ElementChangedEventArgs<Image> e) { base.OnElementChanged(e); var customImage = e.NewElement as MySpin; if (customImage.Animate == true) { Console.WriteLine("true"); Control.Alpha = 0; //transparent } else { Control.Alpha = 1;// opaque Console.WriteLine("false"); } }，其中包含向量列表的列表（每句话一个列表）：

vectors

如果要省略打标（，。：等），请使用vectors = [] for sentence in sentences: sentence_vec = [ word_vec[word] for word in sentence.split() if word in word_vec ] vectors.append( sentence_vec )（重新导入）代替re.findall：

.split

如果您不想跳过words = re.findall(r"[\w']+", sentence) sentence_vec = [ word_vec[word] for word in words if word in word_vec ]中不可用的单词，请使用：

word_vec

它将为每个遗漏的单词放置sentence_vec = [ word_vec[word] if word in word_vec else [0,0] for word in words ]。

如何将句子转换为向量

2 个答案: