Question

我对还另一个正则表达式问题感到难过，但过去一周这让我发疯了。

我试图在python中使用正则表达式来替换一些看起来像这样的文本：

text = """some stuff
line with text
other stuff
[code language='cpp']
#include <cstdio>

int main() {
    printf("Hello");
}
[/code]
Maybe some
other text"""

我想要做的是捕获[code]标记内的文字，在每行前面添加一个标签（\t），然后用这些新行替换所有[code]...[/code]标签前置。也就是说，我希望结果看起来像：

"""some stuff
line with text
other stuff

    #include <cstdio>

    int main() {
        printf("Hello");
    }

Maybe some
other text"""

我正在使用以下代码段。

class CodeParser(object):
    """Parse a blog post and turn it into markdown."""

    def __init__(self):
        self.regex = re.compile('.*\[code.*?\](?P<code>.*)\[/code\].*',
                                re.DOTALL)

    def parse_code(self, text):
        """Parses code section from a wp post into markdown."""
        code = self.regex.match(text).group('code')
        code = ['\t%s' % s for s in code.split('\n')]
        code = '\n'.join(code)
        return self.regex.sub('\n%s\n' % code, text)

问题在于它匹配code标签之前和之后的所有字符，因为初始和最终.*，当我执行替换时，这些被删除。如果我删除.* s，则re不再匹配任何内容。

我认为这可能是换行符的问题，因此我尝试将所有'\n'替换为'¬'，执行匹配，然后将'¬'更改回{ {1}}，但我对这种方法没有任何好运。

如果有人有更好的方法来完成我想要完成的任务，我愿意接受建议。

谢谢。

Answer 1

你走在正确的轨道上。而不是regex.match，使用regex.search。这样你就可以摆脱前导和尾随.*s。

Try this:
    def __init__(self):
        self.regex = re.compile('\[code.*?\](?P<code>.*)\[/code\]',
                                re.DOTALL)


    def parse_code(self, text):
        """Parses code section from a wp post into markdown."""
        # Here we are using search which finds the pattern anywhere in the 
        # string rather than just at the beginning
        code = self.regex.search(text).group('code')
        code = ['\t%s' % s for s in code.split('\n')]
        code = '\n'.join(code)

        return self.regex.sub('\n%s\n' % code, text)

Python多行正则表达式替换

1 个答案: