我有一个函数,可以用字典检查一个字符串(总是一个单词),该词典包含需要用其缩写词替换的单词。
dictionary = {
'average': 'avg',
'weighted': 'wgtd',
'amount': 'amt'
}
def checkAgainstDict(text, counter):
text = text.lower()
original = text
for key in dictionary.keys():
replaceRegex = key + r'(?=[^a-zA-Z0-9])'
print('\t' + original + ' -> ' + text + '\t\t' + replaceRegex + ' => ' + dictionary[key])
if re.match(replaceRegex, text):
print('\tRegex: ' + replaceRegex)
text = re.sub(replaceRegex, dictionary[key], text)
print('\t\tText: ' + text)
elif key in text:
print('\t\tNot Replaced: ' + text + '\t\t' + replaceRegex + ' => ' + dictionary[key])
if len(text) <= 30:
counter += 1
else:
stillToLong.append(text)
return counter
现在输入('weighted_average_starting_amount_term',0)才有效,因此最终结果是'wgtd_average_starting_amount_term'。
如果我调试了情况,我会看到:
weighted_average_starting_amount_term -> weighted_average_starting_amount_term amount(?=[^a-zA-Z0-9]) => amt
Not Replaced: weighted_average_starting_amount_term amount(?=[^a-zA-Z0-9]) => amt
weighted_average_starting_amount_term -> weighted_average_starting_amount_term weight(?=[^a-zA-Z0-9]) => wgt
Not Replaced: weighted_average_starting_amount_term weight(?=[^a-zA-Z0-9]) => wgt
weighted_average_starting_amount_term -> weighted_average_starting_amount_term weighted(?=[^a-zA-Z0-9]) => wgtd
Regex: weighted(?=[^a-zA-Z0-9])
Text: wgtd_average_starting_amount_term
weighted_average_starting_amount_term -> wgtd_average_starting_amount_term average(?=[^a-zA-Z0-9]) => avg
Not Replaced: wgtd_average_starting_amount_term average(?=[^a-zA-Z0-9]) => avg
将功能用作额外信息
for t in table:
counter = checkAgainstDict(t, counter)
我也尝试过使用re.sub标志:re.IGNORECASE。
我的正则表达式很可能是错误的,但是当我在正则表达式提琴手和Atom / Sublime替换功能中对其进行测试时,它可以工作。
我希望得到的最终结果是'wgtd_avg_starting_amt_term'
谢谢。