如何在Python中将英语单词与渐进形式结合起来?

时间:2017-03-18 18:43:43

标签: python nlp

我查看了pattern.en conjugate,但它只是结合成了几种形式,我宁愿不必坐下来编写这些规则的所有例外情况这将允许我进行诸如

之类的结合
  • free - freeing
  • 吃 - 吃
  • 沐浴 - 沐浴
  • be - being
  • ban - banning

nltk已经出现,但它似乎没有相反的操作,至少从搜索StackOverflow开始。这似乎是一个非常基本的NLP任务,但我找不到任何现代的,在Python中这样做。任何一般的共轭工具都会很好,虽然英语中的渐进形式并不具有我所知道的不规则性。

我也在尝试查看此规则是否存在例外情况,这可能是另一种功能:

def present_to_progressive(x):
    vowels = set(['a','e','i','o','u'])
    size = len(x)
    if size == 2:
        return x + 'ing'
    elif x[size - 2:] == 'ie':
        return x[:(size-2)] + 'ying'
    elif x[size - 1] not in vowels and x[size - 2] not in vowels:
        return x + 'ing'
    elif x[size - 1] == 'e' and x[size-2] not in vowels:
        return x[0:(size-1)] + 'ing'
    elif x[size - 1] not in vowels and x[size-2] in vowels:
        if x[size - 3] not in vowels:
             return x + x[size-1] + 'ing'
        else:
             return x + 'ing'
    else:
        return x + 'ing'

编辑:为&#34添加案例;即"动词

2 个答案:

答案 0 :(得分:2)

这种类型的修改有一个完整的库可以满足您的需求。它被称为pattern.en

你可以在这里找到它: pattern.en

这是一个很好的来源。 以下是网站上的共轭教程的摘录:

conjugate(verb, 
    tense = PRESENT,        # INFINITIVE, PRESENT, PAST, FUTURE
   person = 3,              # 1, 2, 3 or None
   number = SINGULAR,       # SG, PL
     mood = INDICATIVE,     # INDICATIVE, IMPERATIVE, CONDITIONAL, SUBJUNCTIVE
   aspect = IMPERFECTIVE,   # IMPERFECTIVE, PERFECTIVE, PROGRESSIVE 
  negated = False,          # True or False
    parse = True)   

它非常实用且非常广泛!

答案 1 :(得分:1)

我认为您的代码涵盖了大多数情况。我检查了从this site取得的620个不规则动词的清单,它错过了大约84个案例。

with open('/tmp/Verblist.vrb', 'rt') as f:
    err = 0
    for l in f:
        if l.startswith('>'):
            forms = l[1:].split(' ')
            guess = present_to_progressive(forms[0])
            if forms[4].lower() != guess.lower():
                print('CHECK: {} {} {}'.format(forms[0], forms[4], guess))
                err += 1
    print(err)

只需将'w','y'添加到元音列表中,可能出现的错误列表可归结为18个案例:

CHECK: Aby/Abey Abying/Abeying Aby/Abeying    -- Correct
CHECK: Eat Eating Eatting
CHECK: Fordo/Foredo Fordoing Fordo/Foredoing  -- Correct in one of the 2 variants
CHECK: Forget Foregetting Forgetting          -- Correct, the list has a typo
CHECK: Lie Lying Lieing                       -- Fixed in your second version
CHECK: Mischoose Mischoosins Mischoosing      -- Correct, the list has a typo
CHECK: Miswed Miswedding Misweding
CHECK: Outswim Outswimming Outswiming
CHECK: Overlie Overlying Overlieing           -- Fixed in your second version
CHECK: Quit Quitting Quiting
CHECK: Relearn Relearn Relearning
CHECK: Rewed Rewedding Reweding
CHECK: Rewet Rewetting Reweting
CHECK: Rewin Rewinning Rewining
CHECK: Swim Swimming Swiming
CHECK: Underlie Underlying Underlieing        -- Fixed in your second version
CHECK: Vex Vexing Vexxing
CHECK: Zinc Zincking Zincing

其中最重要的可以解决这个特殊情况"谎言"并改进了将最后一个辅音加倍的规则。我想你可能决定安全地忽略一些非常罕见的动词。