我正在尝试扩大英语的收缩程度, 我会等 我已将所有缩写及其缩写保存在文件中的字典形式中。 例如:
CONTRACTION_MAP ={"ain't": "is not","aren't": "are not","can't": "cannot","can't've": "cannot have","'cause": "because","could've": "could have","couldn't": "could not","couldn't've": "could not have"..........etc}
下面是扩展收缩的代码
from contractions import CONTRACTION_MAP
import re
def expand_contractions(sentence,contraction_map):
contractions_pattern = re.compile('({})'.format('|'.join(contraction_map.keys())),flags=re.IGNORECASE|re.DOTALL)
def expand_match(contraction):
match=contraction.group(0)
expanded_contraction = contraction_map.get(match)
return expanded_contraction
expanded_sentence = contractions_pattern.sub(expand_match,sentence)
return expanded_sentence
我的问题是,如果我在文字中使用“不能”,那么“不能”一词会扩展为“不能”而不是“不能”。
sentence = "Paul can't've ice cream as he is suffering with cough"
print(expand_contractions(sentence,CONTRACTION_MAP))
output = > Paul cannot've ice cream as he is suffering with cough
任何人都可以帮助我找出需要对代码进行哪些更改才能获得预期的输出。