用re.sub()替换不适用于字典键循环中的所有键

时间:2019-05-27 12:00:55

标签: python regex python-3.x replace

我有一个函数,可以用字典检查一个字符串(总是一个单词),该词典包含需要用其缩写词替换的单词。

dictionary = {  
    'average': 'avg',
    'weighted': 'wgtd',
    'amount': 'amt'
}

def checkAgainstDict(text, counter):
    text = text.lower()
    original = text
    for key in dictionary.keys():
        replaceRegex = key + r'(?=[^a-zA-Z0-9])'
        print('\t' + original + ' -> ' + text + '\t\t' + replaceRegex + ' => ' + dictionary[key])       
        if re.match(replaceRegex, text):
            print('\tRegex: ' + replaceRegex)
            text = re.sub(replaceRegex, dictionary[key], text)
            print('\t\tText: ' + text)
        elif key in text:                   
            print('\t\tNot Replaced: ' + text + '\t\t' + replaceRegex + ' => ' + dictionary[key])                   

    if len(text) <= 30:
        counter += 1            
    else:
        stillToLong.append(text)
    return counter

现在输入('weighted_average_starting_amount_term',0)才有效,因此最终结果是'wgtd_average_starting_amount_term'。

如果我调试了情况,我会看到:

weighted_average_starting_amount_term -> weighted_average_starting_amount_term          amount(?=[^a-zA-Z0-9]) => amt
       Not Replaced: weighted_average_starting_amount_term             amount(?=[^a-zA-Z0-9]) => amt        
weighted_average_starting_amount_term -> weighted_average_starting_amount_term          weight(?=[^a-zA-Z0-9]) => wgt
       Not Replaced: weighted_average_starting_amount_term             weight(?=[^a-zA-Z0-9]) => wgt
weighted_average_starting_amount_term -> weighted_average_starting_amount_term          weighted(?=[^a-zA-Z0-9]) => wgtd
       Regex: weighted(?=[^a-zA-Z0-9])
       Text: wgtd_average_starting_amount_term
weighted_average_starting_amount_term -> wgtd_average_starting_amount_term              average(?=[^a-zA-Z0-9]) => avg
       Not Replaced: wgtd_average_starting_amount_term         average(?=[^a-zA-Z0-9]) => avg

将功能用作额外信息

for t in table:
    counter = checkAgainstDict(t, counter)

我也尝试过使用re.sub标志:re.IGNORECASE。

我的正则表达式很可能是错误的,但是当我在正则表达式提琴手和Atom / Sublime替换功能中对其进行测试时,它可以工作。

我希望得到的最终结果是'wgtd_avg_starting_amt_term'

谢谢。

0 个答案:

没有答案