我想为一个单词创建一组替代单词。替代词必须有适当的不同,以便替换“狗”。与达尔马提亚'太相似了 - 我想要更换“狗”。与猫#39;虽然不是万无一失,但我认为我可以通过获得一个词的上位词和十个上位词的上位词(即祖父母的同义词)并最终得到该祖父母的所有孙子词来做到这一点。
希望这是有道理的。在伪代码中应该读取
for each i as hypernym (synset)
for each j as i.hypernym
get all the holonyms for j as s
for each s get all the holonyms as x
print x
这可行吗?
答案 0 :(得分:1)
from itertools import chain
from collections import defaultdict
from nltk.corpus import wordnet as wn
gflemma_holonym = defaultdict(set)
for ss in wn.all_synsets():
if ss.part_holonyms() and ss.hypernyms() and ss.hypernyms()[0].hypernyms():
grandfather = ss.hypernyms()[0].hypernyms()[0] # grandfather concept.
holonyms = list(chain(*[i.lemma_names() for i in ss.part_holonyms()]))
for lemma in grandfather.lemma_names():
gflemma_holonym[lemma].update(holonyms)
print gflemma_holonym[u'edible_nut']
print
print gflemma_holonym[u'geographical_area']
[OUT]:
set([u'black_hickory', u'black_walnut', u'Juglans_nigra', u'black_walnut_tree'])
set([u'battlefield', u'fair', u'infield', u'field_of_honor', u'field_of_battle', u'battleground', u'city', u'bowl', u'field', u'stadium', u'funfair', u'outfield', u'diamond', u'urban_area', u'populated_area', u'desert', u'arena', u'carnival', u'baseball_diamond', u'sports_stadium', u'ball_field', u'baseball_field'])
请注意wordnet广告资源有限。特别是当你正在寻找相距甚远的概念/引理关系时(即从synset的祖父到synset的全名)
答案 1 :(得分:0)
您可以使用以太列表或字典来执行此操作(字典更加pythonic)。 例如,dictionnary你有这样的东西:
dictionnary={"dog": {"dalmatian","stuff"}, "singer": {"rihanna","eminem"}, "country": {"United states","England"}}
print(dictionnary['dog'])