使用WordNet将特定单词概括为高阶概念

时间:2015-01-22 11:58:31

标签: machine-learning prolog nlp wordnet

WordNet是否有"更高的顺序"概念?如何为给定的单词生成它们?

我有一个prolog事实形式的数据语料库。我想概括概念组件,即'contains'('oranges', 'vitamin c').'contains'('spinach','iron').将推广到'contains'(<food>, <nutrient>).

我不太了解WordNet,所以我想到的一件事就是生成所有可能的上位词,然后组合地详细说明每一条可能的新规则,但这是一种蛮力&# 39;做法。

WordNet是否存储更高阶的概念,例如&lt; food&gt;例如?这可能会更容易,因为那时我可以创建一个具有该特定变量的更高阶概念的新规则,假设在WordNet中有一个,而不是如果我以蛮力方式执行它可能是五十或一百。

所以我真正想知道的是:是否有一个命令可以为给定的事实中的三个组件中的每一个生成更高阶的概念&#39;?或者也许仅仅是括号内的两个。如果这样的命令退出,它是什么?

以下是我正在使用的一些数据供参考。

'be'('mr jiang', 'representing china').
'be'('hrh', 'britain').
'be more than'('# distinguished guests', 'the principal representatives').
'end with'('the playing of the british national anthem', 'hong kong').
'follow at'('the stroke of midnight', 'this').
'take part in'('the ceremony', 'both countries').
'start at about'('# pm', 'the ceremony').
'end about'('# am', 'the ceremony').
'lower'('the british hong kong flag', '# royal hong kong police officers').
'raise'('the sar flag', 'another #').
'leave for'('the royal yacht britannia', 'the #').
'hold by'('the chinese and british governments', 'the handover of hong kong').
'rise over'('this land', 'the regional flag of the hong kong special administrative region of the people \'s republic of china').
'cast eye on'('hong kong', 'the world').
'hold on'('schedule', 'the # governments').
'be festival for'('the chinese nation', 'this').
'go in'('the annals of history', 'july # , #').
'become master of'('this chinese land', 'the hong kong compatriots').
'enter era of'('development', 'hong kong').
'remember'('mr deng xiaoping', 'history').
'be along'('the course', 'it').
'resolve'('the hong kong question', 'we').
'wish to express thanks to'('all the personages', 'i').
'contribute to'('the settlement of the hong kong', 'both china and britain').
'support'('hong kong \'s return', 'the world').

1 个答案:

答案 0 :(得分:0)

Wordnet将高阶概念称为“上位词”。例如,颜色“绿色”的上位词是“彩色”,因为绿色属于更高阶的彩色类。

应该注意Wordnet区分“单词”(字符串)和“sysnets”(我们与给定字符串相关联的含义)。就像一个单词可以有多个含义一样,一个字符串可以有多个同义词。如果要检索给定单词的所有高阶含义,可以在Python中运行这些行:

from nltk import wordnet as wn

# If you are using nltk version 3.0.1, the following will tell you all the synsets for "green" and will thenn find all of their hypernyms. If you're running nltk 3.0.0, you can change the first line to `for synset in wn.synsets('bank'):
for synset in wn.wordnet.synsets('green'):
    for hypernym in synset.hypernyms():
        print synset, hypernym