在python中使用正则表达式扩展英语收缩

时间:2018-10-05 10:49:43

标签: python regex nlp

我正在尝试扩大英语的收缩程度, 我会等 我已将所有缩写及其缩写保存在文件中的字典形式中。 例如:

CONTRACTION_MAP ={"ain't": "is not","aren't": "are not","can't": "cannot","can't've": "cannot have","'cause": "because","could've": "could have","couldn't": "could not","couldn't've": "could not have"..........etc}

下面是扩展收缩的代码

from contractions import CONTRACTION_MAP
import re
    def expand_contractions(sentence,contraction_map):
        contractions_pattern = re.compile('({})'.format('|'.join(contraction_map.keys())),flags=re.IGNORECASE|re.DOTALL)
        def expand_match(contraction):
             match=contraction.group(0)
             expanded_contraction = contraction_map.get(match)
             return expanded_contraction
        expanded_sentence = contractions_pattern.sub(expand_match,sentence)
        return expanded_sentence

我的问题是,如果我在文字中使用“不能”,那么“不能”一词会扩展为“不能”而不是“不能”。

sentence = "Paul can't've ice cream as he is suffering with cough"
print(expand_contractions(sentence,CONTRACTION_MAP))
output = > Paul cannot've ice cream as he is suffering with cough

任何人都可以帮助我找出需要对代码进行哪些更改才能获得预期的输出。

0 个答案:

没有答案