使用字典更改字符串中的单词。蟒

时间:2016-08-18 07:54:40

标签: python string dictionary

我有以下消息:

msg = "Cowlishaw Street & Athllon Drive, Greenway now free of obstruction."

我想改变诸如" Drive"到" Dr"或"街"到" St"

expected_msg = "Cowlishaw St and Athllon Dr Greenway now free of obstruction"

我还有一个"转换功能"

如果列表中有这样的单词,我如何检查列表。如果是,请使用"转换"进行更改。功能。 "转换"是一个字典,其中包含" Drive"充当关键,价值是" Dr"

这就是我所做的

def convert_message(msg, conversion):
    msg = msg.translate({ord(i): None for i in ".,"})
    tokens = msg.strip().split(" ")
    for x in msg:
         if x in keys (conversion):


    return " ".join(tokens)

1 个答案:

答案 0 :(得分:0)

不是简单的:

translations = {'Drive': 'Dr'}

for index, token in enumerate(tokens):
    if token in conversion:
        tokens[index] = conversion[token]

return ' '.join(tokens)

但是,这不适用于像"Obstruction on Cowlishaw Street."这样的句子,因为令牌现在是Street.。也许你应该使用re.sub的正则表达式:

import re
def convert_message(msg, conversion):
    def translate(match):
        word = match.group(0)
        if word in conversion:
            return conversion[word]
        return word

    return re.sub(r'\w+', translate, msg)

此处re.sub找到一个或多个连续的(+)字母数字字符(\w);并且对于每个这样的正则表达式匹配调用给定的函数,给出匹配作为参数;可以使用match.group(0)检索匹配的单词。该函数应返回给定匹配的替换 - 这里,如果在字典中找到该单词,则返回该字符,否则返回原始字符。

因此:

>>> msg = "Cowlishaw Street & Athllon Drive, Greenway now free of obstruction."
>>> convert_message(msg, {'Drive': 'Dr', 'Street': 'St'})
'Cowlishaw St & Athllon Dr, Greenway now free of obstruction.'

对于&,在Python 3.4+上,您应该使用html.unescape来解码HTML实体:

>>> import html
>>> html.unescape('Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.')
'Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.'

这将处理所有已知的HTML实体。对于较旧的python版本,您可以看到alternatives on this question

正则表达式与&字符不匹配;如果你想要替换它,我们可以使用正则表达式\w+|.,这意味着:“任何连续的字母数字字符,或者任何不在这样的运行中的单个字符”:

import re
import html


def convert_message(msg, conversion):
    msg = html.unescape(msg)

    def translate(match):
        word = match.group(0)
        if word in conversion:
            return conversion[word]
        return word

    return re.sub(r'\w+|.', translate, msg)

然后你可以做

>>> msg = 'Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.'
>>> convert_message(msg, {'Drive': 'Dr', '&': 'and', 
                          'Street': 'St', '.': '', ',': ''})
'Cowlishaw St and Athllon Dr Greenway now free of obstruction'