我正在尝试将我从亚马逊评论语料库中创建的word2vec可视化......我采样了大约5k正面和5k负面行....评分栏包含评论是正面还是负面.... 这是我的代码: **
For avg w2v i did this…(list of sent contains the avg w2v for each review)
w2v_model=gensim.models.Word2Vec(list_of_sent,min_count=5,size=50, workers=4)
Y = w2v_model[w2v_model.wv.vocab]
tsne = TSNE(n_components=2, perplexity = 30)
tsne_data = tsne.fit_transform(Y)
**
现在我想根据分数绘制这些,即蓝色圆点为正面,红色为负面.......我不知道该怎么做!! ..... 任何帮助将不胜感激..
答案 0 :(得分:0)
如果我理解这一点,你基本上想要创建一个散点图X = TSNE组件1,Y = TSNE组件2并用目标变量(正面或负面)着色
以下示例代码实现了这个::
tsneDf = pd.DataFrame(data = tsne_data ,columns = ['TSNE component 1',
'TSNE component 2'])
#Create a dataframe of TSNE Compoenent and the Score Column
finalDf = pd.concat([tsneDf, df[['ScoreColumn']]], axis = 1)
#Now we jsut plot a scatter plot
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('TSNE Component 1', fontsize = 15)
ax.set_ylabel('TSNE Component 2', fontsize = 15)
ax.set_title('2 component TSNE', fontsize = 20)
#In this example 0:Negative and 1:Positive and we map respective colour
targets = [0, 1]
colors = ['r', 'g']
for target, color in zip(targets,colors):
indicesToKeep = finalDf['ScoreColumn'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'TSNE component 1']
, finalDf.loc[indicesToKeep, 'TSNE component 2']
, c = color
, s = 25,alpha=0.4)
ax.legend(targets)
ax.grid()
答案 1 :(得分:0)
您已将数据集映射为二维,因此,数据集可以转换为3列,其中x,y将来自t-SNE以及正负分类列
您可以使用matplotlib的Scatterplot
绘制相同的图表