Question

在this thread中找到最快的字符串替换算法后，我一直在尝试修改其中一个以满足我的需求，尤其是gnibbler的this one。

我将在这里再次解释这个问题，以及我遇到的问题。

说我有一个看起来像这样的字符串：

str = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog"

您会注意到字符串中有很多位置，其中有一个＆符号，后跟一个字符（例如“＆amp; y”和“＆amp; c”）。我需要用字典中的适当值替换这些字符，如下所示：

dict = {"y":"\033[0;30m",
        "c":"\033[0;31m",
        "b":"\033[0;32m",
        "Y":"\033[0;33m",
        "u":"\033[0;34m"}

使用我之前的线程中提供的gnibblers解决方案，我将此作为我当前的解决方案：

myparts = tmp.split('&')
myparts[1:]=[dict.get(x[0],"&"+x[0])+x[1:] for x in myparts[1:]]
result = "".join(myparts)

这适用于正确替换字符，并且对未找到的字符不会失败。唯一的问题是没有简单的方法实际在输出中保留＆符号。我能想到的最简单的方法是将我的字典改为包含：

dict = {"y":"\033[0;30m",
        "c":"\033[0;31m",
        "b":"\033[0;32m",
        "Y":"\033[0;33m",
        "u":"\033[0;34m",
        "&":"&"}

并改变我的“分裂”调用，对未跟随其他＆符号的＆符进行正则表达式分割。

>>> import re
>>> tmp = "&yI &creally &blove A && W &uRootbeer."
>>> tmp.split('&')
['', 'yI ', 'creally ', 'blove A ', '', ' W ', 'uRootbeer.']
>>> re.split('MyRegex', tmp)
['', 'yI ', 'creally ', 'blove A ', '&W ', 'uRootbeer.']

基本上，我需要一个正则表达式，它将分为第一个＆符号，以及每个单个＆符号，以允许我通过我的字典转义它。< / p>

如果有人有更好的解决方案，请随时告诉我。

Answer 1

您可以使用负面的lookbehind（假设所讨论的正则表达式引擎支持它）仅匹配不遵循另一个＆符号的＆符号。

/(?<!&)&/

Answer 2

也许循环while（q = str.find（'＆amp;'，p））！= -1，然后追加左侧（p + 2到q - 1）和替换值。

Answer 3

我认为这就是诀窍：

import re

def fix(text):
    dict = {"y":"\033[0;30m",
            "c":"\033[0;31m",
            "b":"\033[0;32m",
            "Y":"\033[0;33m",
            "u":"\033[0;34m",
            "&":"&"}

    myparts = re.split('\&(\&*)', text)
    myparts[1:]=[dict.get(x[0],"&"+x[0])+x[1:] if len(x) > 0 else x for x in myparts[1:]]
    result = "".join(myparts)
    return result


print fix("The &yquick &cbrown &bfox &Yjumps over the &ulazy dog")
print fix("&yI &creally &blove A && W &uRootbeer.")

Answer 4

re.sub会做你想要的。它采用正则表达式模式，可以采用函数来处理匹配并返回替换。如果字符跟随＆amp;不在字典中，没有替换。＆安培;＆安培;用＆amp;替换允许逃避＆amp;然后是字典中的一个字符。

'str'和'dict'也是坏变量名，因为它们会影响同名的内置函数。

在's'下面，'＆amp;猫'不会受到影响，'＆amp;＆amp; cat'将成为“＆amp; cat”压制和翻译。

import re

s = "The &yquick &cbrown &bfox & cat &&cat &Yjumps over the &ulazy dog"

D = {"y":"\033[0;30m",
     "c":"\033[0;31m",
     "b":"\033[0;32m",
     "Y":"\033[0;33m",
     "u":"\033[0;34m",
     "&":"&"}

def func(m):
    return D.get(m.group(1),m.group(0))

print repr(re.sub(r'&(.)',func,s))

输出：

'The \x1b[0;30mquick \x1b[0;31mbrown \x1b[0;32mfox & cat &cat \x1b[0;33mjumps over the \x1b[0;34mlazy dog'

-Mark

正则表达式仅在特定字符不在一对中时才分割

4 个答案: