
时间:2015-05-14 09:04:21

标签: machine-learning nlp

我试图分析论文' Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis''。



When implementing a process like singular value decomposition (SVD) or Markov
chain Monte Carlo machines, a corpus of documents can be partitioned on the
basis of inherent characteristics and assigned to categories by applying different

weights to the features that constitute each singular data index. In this highdimensional
space it is often difficult to determine the combination of factors
leading to an outcome or result, the variables of interest are “hidden” or latent.
By defining a set of humanly intelligible categories, i.e. Wikipedia article
pages as a basis for comparison [Gabrilovich et al. 2007] have devised a system
whereby the criteria used to distinguish a datum are readily comprehensible,
from the text we note that “semantic analysis is explicit in the sense that we
manipulate manifest concepts grounded in human cognition, rather than ‘latent
concepts’ used by Latent Semantic Analysis”.
With that we have now established Explicit Semantic Analysis in opposition
to Latent Semantic Analysis.


有关此主题的信息有点稀疏。 This question表面上处理类似的问题,但不是真的。

2 个答案:

答案 0 :(得分:1)




答案 1 :(得分:1)


ESA - 使用像(维基百科)这样的知识库来创建将单词映射到内容的倒排索引(即单词出现的维基百科页面的标题)。然后对这个单词的向量表示进行操作,其中每个单词现在是标题的向量,其中包含0,1。

LSA - 使用奇异值分解原理将word-doc矩阵投影到排名较低的空间,以便word-doc矢量表示在任何文档中不会彼此共同出现的单词的点积,但它们是co - 与一组相似的单词(即Imagine Cat和Car永远不会在文档中共同出现,但在某些文档D_1中可能与Man一起出现,而Car在其他文档D_2中与Man共同出现)更高。