共同图(TCG)的用途(优点)是什么?

时间:2016-01-12 09:38:32

标签: machine-learning artificial-intelligence data-mining text-mining information-retrieval

我是这个领域的新手,并且非常热衷于发展我的知识。但是在阅读一篇研究论文时,我有几个疑问,其中说:

All the nouns are extracted from the given biomedical text document and a term co-occurrence graph (TCG) is built from these terms. The term co-occurrence graph represents the knowledge of the system.The TCG is treated as the background knowledge of the systems and is used for query expansion of the input query.

The TCG is queried for the semantic context of closure (SCC) of the given input query term.

闭包的这个语义上下文(SCC)是什么?

在现有co-occurrence graphs上使用这些searching engines有什么好处。搜索引擎是否也使用这些图表?

即使有人为这些话题提供了一些资源,我也会很高兴。

1 个答案:

答案 0 :(得分:0)

Co-occurrence is just a way to record how words relate to each other given a corpus of text. If 'pain' and 'broken' occurs in the same document then a link is recorded, typically in a sparse matrix where columns and rows are words/tokens/terms. An entry in the matrix represents the actual relation between the two words. A co-occurrence could be recorded as a 1, or if the frequency of co-occurrence is important you can increment it by one for every co-occurrence, or do that in addition to a transformation which scales the strength between the co-occurrence.

Search engines in general are huge beasts, with even huger bags of tricks. Co-occurrance is just one trick of many which search engines employ, PageRank is another, but a vastly more important one.

The advantage of co-occurance is that it is very simple, and you can use it on pure text and you can apply fast matrix operations on them. Things like PageRank might require that you have other features like hyperlinks. Research papers have something similar which could be used with a sort of PageRank algorithm, which is references. However, if you're interested in the tokens inside the text it doesn't really help you.

With regard to "semantic context of closure", I'm not entirely sure what they are referring to. I'd venture to think it means that given a term or sentence or what have you, you go into the graph/network generated by co-occurance to find other related terms. If we had co-occurance between 'broken' and 'hurt' and between 'pain' and 'hurt', if we get presented with the word 'pain' the closure would return all three words. But this is just my hunch. I'd have to read the paper to understand it in-context.