如何理解文档中主题共享的热图

时间:2015-10-05 09:17:51

标签: python topic-modeling

enter image description here

我试图了解这表明了什么。在主题0和1如何属于奥斯汀小说和主题3表明与勃朗特小说的关联?颜色的强弱是衡量什么的?

编辑:

In [27]: plt.pcolor(doctopic, norm=None, cmap='Blues')
Out[27]: <matplotlib.collections.PolyCollection at 0x2b10c1557048>

# put the major ticks at the middle of each cell
# the trailing semicolon ';' suppresses output
In [28]: plt.yticks(np.arange(doctopic.shape[0])+0.5, docnames);

In [29]: plt.xticks(np.arange(doctopic.shape[1])+0.5, topic_labels);

# flip the y-axis so the texts are in the order we anticipate (Austen first, then Brontë)
In [30]: plt.gca().invert_yaxis()

# rotate the ticks on the x-axis
In [31]: plt.xticks(rotation=90)
Out[31]: (array([ 0.5,  1.5,  2.5,  3.5,  4.5]), <a list of 5 Text xticklabel objects>)

# add a legend
In [32]: plt.colorbar(cmap='Blues')
Out[32]: <matplotlib.colorbar.Colorbar at 0x2b10d01f8320>

In [33]: plt.tight_layout()  # fixes margins

In [34]: plt.show()

1 个答案:

答案 0 :(得分:0)

颜色越深,小说中的单词(左轴)就越多与主题相关联。因此,例如,在Austen_Emma中,有很多单词属于主题#3,而主题#0中的单词较少。在Austen_Sense中,大多数单词都与主题#4相关联。这张热图帮助您确定哪些主题在小说中占主导地位。