在动词/名词/形容词形式之间转换单词

时间:2013-01-23 21:01:48

标签: python nlp nltk wordnet

我想要一个python库函数,它可以翻译/转换不同的语音部分。有时它应输出多个单词(例如“编码器”和“代码”都是动词“代码”中的名词,一个是主体,另一个是对象)

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

我主要关心的是动词< =>名词,我想写的笔记程序。即我可以写“咖啡因拮抗A1”或“咖啡因是A1拮抗剂”,并且通过一些NLP可以发现它们的意思相同。 (我知道这并不容易,并且它需要解析NLP并且不仅仅是标记,但我想破解原型)。

类似问题...... Converting adjectives and adverbs to their noun forms (这个答案仅限于根POS。我想在POS之间进行。)

ps称为语言学转换http://en.wikipedia.org/wiki/Conversion_%28linguistics%29

4 个答案:

答案 0 :(得分:15)

这更像是一种启发式方法。我刚刚编写了这样的风格的应用程序。它使用来自wordnet的derivationally_related_forms()。我已经实现了nounify。我猜verbify的作用类似。从我测试过的工作做得很好:

from nltk.corpus import wordnet as wn

def nounify(verb_word):
    """ Transform a verb to the closest noun: die -> death """
    verb_synsets = wn.synsets(verb_word, pos="v")

    # Word not found
    if not verb_synsets:
        return []

    # Get all verb lemmas of the word
    verb_lemmas = [l for s in verb_synsets \
                   for l in s.lemmas if s.name.split('.')[1] == 'v']

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) \
                                    for l in    verb_lemmas]

    # filter only the nouns
    related_noun_lemmas = [l for drf in derivationally_related_forms \
                           for l in drf[1] if l.synset.name.split('.')[1] == 'n']

    # Extract the words from the lemmas
    words = [l.name for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result

答案 1 :(得分:4)

这是一个理论上能够在名词/动词/形容词/副词形式之间转换单词的函数,我从here(最初由bogs编写,我相信)更新为符合nltk 3.2.5现在synset.lemmassysnset.name是函数。

from nltk.corpus import wordnet as wn

# Just to make it a bit more readable
WN_NOUN = 'n'
WN_VERB = 'v'
WN_ADJECTIVE = 'a'
WN_ADJECTIVE_SATELLITE = 's'
WN_ADVERB = 'r'


def convert(word, from_pos, to_pos):    
    """ Transform words given from/to POS tags """

    synsets = wn.synsets(word, pos=from_pos)

    # Word not found
    if not synsets:
        return []

    # Get all lemmas of the word (consider 'a'and 's' equivalent)
    lemmas = []
    for s in synsets:
        for l in s.lemmas():
            if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                lemmas += [l]

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas]

    # filter only the desired pos (consider 'a' and 's' equivalent)
    related_noun_lemmas = []

    for drf in derivationally_related_forms:
        for l in drf[1]:
            if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                related_noun_lemmas += [l]

    # Extract the words from the lemmas
    words = [l.name() for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w)) / len_words) for w in set(words)]
    result.sort(key=lambda w:-w[1])

    # return all the possibilities sorted by probability
    return result


convert('direct', 'a', 'r')
convert('direct', 'a', 'n')
convert('quick', 'a', 'r')
convert('quickly', 'r', 'a')
convert('hunger', 'n', 'v')
convert('run', 'v', 'a')
convert('tired', 'a', 'r')
convert('tired', 'a', 'v')
convert('tired', 'a', 'n')
convert('tired', 'a', 's')
convert('wonder', 'v', 'n')
convert('wonder', 'n', 'a')

正如您在下面所看到的,它并没有那么好用。它无法在形容词和副词形式(我的具体目标)之间切换,但在其他情况下确实会给出一些有趣的结果。

>>> convert('direct', 'a', 'r')
[]
>>> convert('direct', 'a', 'n')
[('directness', 0.6666666666666666), ('line', 0.3333333333333333)]
>>> convert('quick', 'a', 'r')
[]
>>> convert('quickly', 'r', 'a')
[]
>>> convert('hunger', 'n', 'v')
[('hunger', 0.75), ('thirst', 0.25)]
>>> convert('run', 'v', 'a')
[('persistent', 0.16666666666666666), ('executive', 0.16666666666666666), ('operative', 0.16666666666666666), ('prevalent', 0.16666666666666666), ('meltable', 0.16666666666666666), ('operant', 0.16666666666666666)]
>>> convert('tired', 'a', 'r')
[]
>>> convert('tired', 'a', 'v')
[]
>>> convert('tired', 'a', 'n')
[('triteness', 0.25), ('banality', 0.25), ('tiredness', 0.25), ('commonplace', 0.25)]
>>> convert('tired', 'a', 's')
[]
>>> convert('wonder', 'v', 'n')
[('wonder', 0.3333333333333333), ('wonderer', 0.2222222222222222), ('marveller', 0.1111111111111111), ('marvel', 0.1111111111111111), ('wonderment', 0.1111111111111111), ('question', 0.1111111111111111)]
>>> convert('wonder', 'n', 'a')
[('curious', 0.4), ('wondrous', 0.2), ('marvelous', 0.2), ('marvellous', 0.2)]

希望能够为某人节省一点麻烦

答案 2 :(得分:3)

我知道这不能回答你的整个问题,但它确实回答了很大一部分问题。我会退房 http://nodebox.net/code/index.php/Linguistics#verb_conjugation 这个python库能够结合动词,并识别单词是动词,名词还是形容词。

示例代码

print en.verb.present("gave")
print en.verb.present("gave", person=3, negate=False)
>>> give
>>> gives

它还可以对单词进行分类。

print en.is_noun("banana")
>>> True

下载位于链接的顶部。

答案 3 :(得分:3)

一种方法可能是使用带有POS标签和字形映射的单词字典。如果您获得或创建了这样的字典(如果您可以访问任何常规字典的数据,如果所有字典都列出了单词的POS标签,以及所有派生表单的基本表单),那么您可以使用如下内容:

def is_verb(word):
    if word:
        tags = pos_tags(word)
        return 'VB' in tags or 'VBP' in tags or 'VBZ' in tags \
               or 'VBD' in tags or 'VBN' in tags:

def verbify(word):
    if is_verb(word):
        return word
    else:
       forms = []
       for tag in pos_tags(word):
           base = word_form(word, tag[:2])
           if is_verb(base):
              forms.append(base)
       return forms