鉴于有一条来自两个常见同义词集的路径来获得最低常见的上位词,看起来应该有一些方法可以走回去找到导致该上位词的下位词
from nltk.corpus import wordnet as wn
alaska = wn.synset('Alaska.n.1')
california = wn.synset('California.n.1')
common_hypernym = alaska.lowest_common_hypernyms(california)[0]
common_hypernym
Synset('american_state.n.01')
common_hypernym.do_something_awesome()
['Alabama.n.1', 'Alaska.n.1', ...] #all 50 american states
答案 0 :(得分:1)
使用Synset1._shortest_path_distance(Synset2)
查找上位词及其距离:
>>> from nltk.corpus import wordnet as wn
>>> alaska = wn.synset('Alaska.n.1')
>>> california = wn.synset('California.n.1')
>>> alaska._shortest_hypernym_paths(california)
{Synset('district.n.01'): 4, Synset('location.n.01'): 6, Synset('region.n.03'): 5, Synset('physical_entity.n.01'): 8, Synset('entity.n.01'): 9, Synset('state.n.01'): 2, Synset('administrative_district.n.01'): 3, Synset('object.n.01'): 7, Synset('alaska.n.01'): 0, Synset('*ROOT*'): 10, Synset('american_state.n.01'): 1}
现在找到最小路径:
>>> paths = alaska._shortest_hypernym_paths(california)
>>> min(paths, key=paths.get)
Synset('alaska.n.01')
现在,这很无聊,因为california
和alaska
是WordNet层次结构中的姐妹节点。让我们过滤掉所有姐妹节点:
>>> paths = {k:v for k,v in paths.items() if v > 0}
>>> min(paths, key=paths.get)
Synset('american_state.n.01')
获取american_state
的子节点(我认为这是“你需要的东西”):
>>> min(paths, key=paths.get).hyponyms()
[Synset('free_state.n.02'), Synset('slave_state.n.01')]
>>> list(min(paths, key=paths.get).closure(lambda s:s.hyponyms()))
[Synset('free_state.n.02'), Synset('slave_state.n.01')]
这可能看起来很震撼,但实际上,alaska
或california
没有表示上位词:
>>> alaska.hypernyms()
[]
>>> california.hypernyms()
[]
使用_shortest_hypernym_paths
建立的连接是通过虚拟根,看看Is wordnet path similarity commutative?
答案 1 :(得分:0)
较新的解决方案是:
alaska = wordnet.synset('Alaska.n.1')
california = wordnet.synset('California.n.1')
alaska.lowest_common_hypernyms(california)
[Synset('american_state.n.01')]
这个旧功能是私有的,不能用这种方式工作,也许其他方式也可以,但是您也可以选择x.common.hypernyms(y)
来查找所有常见项目。