我正在研究亚马逊食品评论的情感分析,我正在尝试将Word2Vec应用于评论并使用t-SNE对其进行可视化。
我很容易使用以下代码使用Bag of words表示形象化:
from sklearn.manifold import TSNE
data_2000 = final_counts[0:2000,:]
top_2000 = data_2000.toarray()
labels = final['Score']
labels_2000 = labels[0:2000]
model = TSNE(n_components=2, random_state=0)
tsne_data = model.fit_transform(top_2000)
# creating a new data frame which help us in ploting the result
tsne_data = np.vstack((tsne_data.T, labels_2000)).T
tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2",
"label"))
# Ploting the result of tsne
sns.FacetGrid(tsne_df, hue="label", size=6).map(plt.scatter,
'Dim_1', 'Dim_2').add_legend()
plt.show()
此外,当我提供类型为gensim.models.word2vec.Word2Vec的w2v_model模型时,相同的代码不起作用
我使用以下代码获得了该模型:
w2v_model=gensim.models.Word2Vec(list_of_sent,min_count=5,size=50,
workers=4)
答案 0 :(得分:0)
您需要在训练模型后提取所有单词嵌入。 我建议以下列方式提取到pd.DataFrame:
all_vocab = list(w2v_model.wv.vocab.keys())
data_dict = {word: w2v_model.wv[word] for word in all_vocab}
result = pd.DataFrame(data=data_dict).transpose()
如果您想在scikit中执行降维,
只需通过result.values
答案 1 :(得分:0)
from torchtext import vocab
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
glove = vocab.GloVe(name = '6B', dim = 100)
print(f'There are {len(glove.itos)} words in the vocabulary')
def tsne_plot(glove,n=200,n_components=2):
"Creates and TSNE model and plots it"
labels = []
tokens = []
for word , tensor_value in zip(glove.itos[:n],glove.vectors[:n]):
tokens.append(tensor_value.numpy())
labels.append(word)
tsne_model = TSNE(perplexity=40, n_components=n_components, init='pca', n_iter=2500, random_state=23)
new_values = tsne_model.fit_transform(tokens)
fig = plt.figure(figsize=(16, 16))
if n_components==3:
ax = fig.add_subplot(111, projection='3d')
ax.scatter(new_values[:,0],new_values[:,1],new_values[:,2],c="r",marker="o")
for i in range(len(new_values)):
ax.text(new_values[i][0],new_values[i][1],new_values[i][2],labels[i])
else:
plt.scatter(new_values[:,0],new_values[:,1])
for i in range(len(new_values)):
plt.annotate(labels[i],
xy=(new_values[i][0],new_values[i][1]),
xytext=(5, 2),
textcoords='offset points',
ha='right',
va='bottom')
return new_values,labels
new_values,labels = tsne_plot(glove,n_components=2)