Question

我在这里遇到了一些问题。我需要迁移一个wiki及其内部链接。这些链接有许多不同的格式，其中之一是：[＆＃34; link＆＃34;]

我试过这个：

m = re.search(r'\[\"(\w+\s\w+)\"\]',s)

if m:
    print m.group(1)
    m2 = re.sub(r'\[\"(?:[^\]|]*\|)?([^\]|]*)\"\]', r'\1', s)
    return m2

但是这没有用......关于如何用正则表达式做正确的方法的任何想法？

修改

它还需要忽略链接中的任何空格。所以例如[＆＃34;这是一个测试＆＃34;]也必须工作

Answer 1

您的正则表达式存在一个问题，即您需要一个单词（\w+），一些空格（\s）和另一个单词。因此，您不会获得单个字词或更长的链接（例如this is a test）。相反，你可能想要查找一个单词后跟另一个由空格分隔的单词;并反复：(\w+(?:\s\w+)*)。

但是你也可以匹配任何不是引用字符的内容，因为你的链接必须以它们结尾：

s = 'Some ["text"] with ["lots of links"] as an ["example"] for a ["regular expression"] that ["finds them all"].'
>>> re.findall('\["([^"]+)"\]', s)
['text', 'lots of links', 'example', 'regular expression', 'finds them all']

Python RegEx subbing [“”]

1 个答案: