我有一个使用Word2Vec训练的模型。它运作良好。 我只想绘制我在列表中输入的仅单词列表。 我已经在下面编写了该函数(并重用了一些找到的代码),并在将向量添加到 arr 时得到了以下错误消息: “ ValueError:所有输入数组的维数必须相同”
def display_wordlist(model, wordlist):
vector_dim = model.vector_size
arr = np.empty((0,vector_dim), dtype='f') #dimension trained by the model
word_labels = [word]
# get words from word list and append vector to 'arr'
for wrd in wordlist:
word_array = model[wrd]
arr = np.append(arr,np.array(word_array), axis=0) #This goes wrong
# Use tsne to reduce to 2 dimensions
tsne = TSNE(perplexity=65,n_components=2, random_state=0)
np.set_printoptions(suppress=True)
Y = tsne.fit_transform(arr)
x_coords = Y[:, 0]
y_coords = Y[:, 1]
# display plot
plt.figure(figsize=(16, 8))
plt.plot(x_coords, y_coords, 'ro')
for label, x, y in zip(word_labels, x_coords, y_coords):
plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points')
plt.xlim(x_coords.min()+0.00005, x_coords.max()+0.00005)
plt.ylim(y_coords.min()+0.00005, y_coords.max()+0.00005)
plt.show()
答案 0 :(得分:1)
arr
的形状为(0, vector_dim)
,而word_array
的形状为(vector_dim,)
。这就是为什么您会收到该错误。
只需重塑word_array即可达到目的:
word_array = model[wrd].reshape(1, -1)
为什么要传递单词列表而不是“查询”模型呢?
wordlist = list(model.wv.vocab)
答案 1 :(得分:0)
谢谢。现在,我已经修改了代码,并提供了正确的结果:
def display_wordlist(model, wordlist):
vectors = [model[word] for word in wordlist if word in model.wv.vocab.keys()]
word_labels = [word for word in wordlist if word in model.wv.vocab.keys()]
word_vec_zip = zip(word_labels, vectors)
# Convert to a dict and then to a DataFrame
word_vec_dict = dict(word_vec_zip)
df = pd.DataFrame.from_dict(word_vec_dict, orient='index')
# Use tsne to reduce to 2 dimensions
tsne = TSNE(perplexity=65,n_components=2, random_state=0)
np.set_printoptions(suppress=True)
Y = tsne.fit_transform(df)
x_coords = Y[:, 0]
y_coords = Y[:, 1]
# display plot
plt.figure(figsize=(16, 8))
plt.plot(x_coords, y_coords, 'ro')
for label, x, y in zip(df.index, x_coords, y_coords):
plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points')
plt.xlim(x_coords.min()+0.00005, x_coords.max()+0.00005)
plt.ylim(y_coords.min()+0.00005, y_coords.max()+0.00005)
plt.show()