在Python中获取新单词的派生形式

时间:2013-02-11 21:20:19

标签: python nltk

我想知道我是否可以获得给定单词的所有派生形式

例如,给出“好”这个词,我得到“善”和“好”等等。

特别要获得“形容词”的相关名词

由于

2 个答案:

答案 0 :(得分:0)

由于某些原因,特别是模棱两可的问题,做一些你想要的事情会有些困难。如果您选择正确的单词意义,或者自动word sense disambiguation(WSD),这可以稍微缓解。

例如,对于“好”这个词,WordNet中有27种感官,37种独特的lemmas(字典形式)。

以下是使用NLTK实施WordNet的简单示例。

>>> from nltk.corpus import wordnet
>>> good = wordnet.synsets('good')

>>> lemmas = set()
>>> for synset in good:
    for lemma in synset.lemmas:
        lemmas.add(lemma.name)
>>> lemmas
set(['beneficial', 'right', 'secure', 'just', 'unspoilt', 'respectable', 'good', 'goodness', 'dear', 'salutary', 'ripe', 'expert', 'skillful', 'in_force', 'proficient', 'unspoiled', 'dependable', 'soundly', 'honorable', 'full', 'undecomposed', 'safe', 'adept', 'upright', 'trade_good', 'sound', 'in_effect', 'practiced', 'effective', 'commodity', 'estimable', 'well', 'honest', 'near', 'skilful', 'thoroughly', 'serious'])
>>> len(lemmas)
37

>>> for synset in good:
    print synset
    print synset.lemmas
    print '-' * 79

Synset('good.n.01')
[Lemma('good.n.01.good')]
-------------------------------------------------------------------------------
Synset('good.n.02')
[Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]
-------------------------------------------------------------------------------
Synset('good.n.03')
[Lemma('good.n.03.good'), Lemma('good.n.03.goodness')]
-------------------------------------------------------------------------------
Synset('commodity.n.01')
[Lemma('commodity.n.01.commodity'), Lemma('commodity.n.01.trade_good'), Lemma('commodity.n.01.good')]
-------------------------------------------------------------------------------
Synset('good.a.01')
[Lemma('good.a.01.good')]
-------------------------------------------------------------------------------
Synset('full.s.06')
[Lemma('full.s.06.full'), Lemma('full.s.06.good')]
-------------------------------------------------------------------------------
Synset('good.a.03')
[Lemma('good.a.03.good')]
-------------------------------------------------------------------------------
Synset('estimable.s.02')
[Lemma('estimable.s.02.estimable'), Lemma('estimable.s.02.good'), Lemma('estimable.s.02.honorable'), Lemma('estimable.s.02.respectable')]
-------------------------------------------------------------------------------
Synset('beneficial.s.01')
[Lemma('beneficial.s.01.beneficial'), Lemma('beneficial.s.01.good')]
-------------------------------------------------------------------------------
Synset('good.s.06')
[Lemma('good.s.06.good')]
-------------------------------------------------------------------------------
Synset('good.s.07')
[Lemma('good.s.07.good'), Lemma('good.s.07.just'), Lemma('good.s.07.upright')]
-------------------------------------------------------------------------------
Synset('adept.s.01')
[Lemma('adept.s.01.adept'), Lemma('adept.s.01.expert'), Lemma('adept.s.01.good'), Lemma('adept.s.01.practiced'), Lemma('adept.s.01.proficient'), Lemma('adept.s.01.skillful'), Lemma('adept.s.01.skilful')]
-------------------------------------------------------------------------------
Synset('good.s.09')
[Lemma('good.s.09.good')]
-------------------------------------------------------------------------------
Synset('dear.s.02')
[Lemma('dear.s.02.dear'), Lemma('dear.s.02.good'), Lemma('dear.s.02.near')]
-------------------------------------------------------------------------------
Synset('dependable.s.04')
[Lemma('dependable.s.04.dependable'), Lemma('dependable.s.04.good'), Lemma('dependable.s.04.safe'), Lemma('dependable.s.04.secure')]
-------------------------------------------------------------------------------
Synset('good.s.12')
[Lemma('good.s.12.good'), Lemma('good.s.12.right'), Lemma('good.s.12.ripe')]
-------------------------------------------------------------------------------
Synset('good.s.13')
[Lemma('good.s.13.good'), Lemma('good.s.13.well')]
-------------------------------------------------------------------------------
Synset('effective.s.04')
[Lemma('effective.s.04.effective'), Lemma('effective.s.04.good'), Lemma('effective.s.04.in_effect'), Lemma('effective.s.04.in_force')]
-------------------------------------------------------------------------------
Synset('good.s.15')
[Lemma('good.s.15.good')]
-------------------------------------------------------------------------------
Synset('good.s.16')
[Lemma('good.s.16.good'), Lemma('good.s.16.serious')]
-------------------------------------------------------------------------------
Synset('good.s.17')
[Lemma('good.s.17.good'), Lemma('good.s.17.sound')]
-------------------------------------------------------------------------------
Synset('good.s.18')
[Lemma('good.s.18.good'), Lemma('good.s.18.salutary')]
-------------------------------------------------------------------------------
Synset('good.s.19')
[Lemma('good.s.19.good'), Lemma('good.s.19.honest')]
-------------------------------------------------------------------------------
Synset('good.s.20')
[Lemma('good.s.20.good'), Lemma('good.s.20.undecomposed'), Lemma('good.s.20.unspoiled'), Lemma('good.s.20.unspoilt')]
-------------------------------------------------------------------------------
Synset('good.s.21')
[Lemma('good.s.21.good')]
-------------------------------------------------------------------------------
Synset('well.r.01')
[Lemma('well.r.01.well'), Lemma('well.r.01.good')]
-------------------------------------------------------------------------------
Synset('thoroughly.r.02')
[Lemma('thoroughly.r.02.thoroughly'), Lemma('thoroughly.r.02.soundly'), Lemma('thoroughly.r.02.good')]
-------------------------------------------------------------------------------

答案 1 :(得分:-1)

我建议查看NLTK中的WordNet corpus。有关WordNet的更多信息here