Question

我正在尝试为项目创建聊天机器人，并且我正在使用spaCy。我正在学习一个教程，我需要创建一个2D数组X，它的行与我的数据集中的句子一样多。每行应为描述句子的单词向量。但是，当我尝试制作此数组时，出现错误。我不太确定是什么原因造成的，因为我通常是spaCy和NLP的新手。

我试图从文档中找出问题所在。我也查看了堆栈溢出，但是找不到任何能解释我的问题的东西。

import spacy
import numpy
#load spacy nlp model
nlp = spacy.load("en_core_web_sm")

#calculate the length of my sentences dataset
n_sentences = len(sentences)
#calculate the dimensionality of nlp model
embedding_dim = nlp.vocab.vectors_length
#X is a 2D array with as many rows as there are sentences in my dataset
#Each row is a vector describing the sentence
#initialise array with zeros
X = numpy.zeros((n_sentences, embedding_dim))
#iterate over sentences
for idx, sentence in enumerate(sentences):
   #pass each sentence to nlp object to create document
   doc = nlp(sentence)
   print(doc.vector.shape)
   #save document's .vector attribute to corresponding row in X
   X[idx, :] = doc.vector

据我所知，这是引发错误的最后一行。

ValueError: could not broadcast input array from shape (96) into shape (1,0)

我不知道是什么原因造成的，因为我对numpy数组和数组形状不是很熟悉。我的数据集（句子）是一个简单的字符串列表。我期望最终得到一个包含单词向量的2D数组。我正在遵循的教程说代码是正确的，所以我不确定为什么它对我不起作用，我想我一定错过了一些东西。

编辑：请有人帮忙吗？我再次查看了代码，试图找出错误所在，但无法修复。我真的需要尽快了解我的A Level项目，否则我将无法完成。

Answer 1

en_core_web_sm模型不包含单词向量。您可以改为下载en_core_web_md或en_core_web_lg models。

Reference

nlp = spacy.load("en_core_web_md")
print (nlp.vocab.vectors_length)

输出：

使用字向量时出现ValueError（无法广播）：如何解决？

1 个答案: