Question

我的目标是从hello_kitty.dat中选择Lorem 'hello_kitty.dat' ipsum.之类的字符串。

我写过这个片段在某种程度上适用于较小的字符串（来自teststring选择一个或多个（+）单词字符（\w），然后在点（\.）之前选择三个单词字符（\w{3}）使用x进行 sub 选择。

>>> teststring = "Lorem 'hello_kitty.dat' ipsum."
>>> print(re.sub(r'\w+\.\w{3}', "x", teststring))

"Lorem 'x' ipsum."

但是，即使在\w{3}之后完全不遵循我的模式，我如何修改代码以选择单引号之间的所有内容？

teststring可能是 "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"但在这种情况下我不想选择hello_kitty.cmd?command92，因为它不在单引号内。

Answer 1

您可以使用：

import re
teststring = "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
print(re.sub(r"'\w+\.\w{3}[^']*'", "'x'", teststring))
# => Lorem 'x' ipsum hello_kitty.cmd?command92

请参阅Python demo

现在模式匹配：

' - 单引号
\w+ - 一个或多个单词字符
\. - 一个点
\w{3} - 3个字的字符
[^']* - 一个否定的字符类，匹配除单引号之外的任何0 +字符
' - 单引号。

Answer 2

要把我的两分钱，你可以使用：

'[^']+' # quotes with a negated character class in between

<小时/> Python中的内容是：

import re

string = """
"Lorem 'hello_kitty.dat' ipsum."
"Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
"""

rx = re.compile(r"'[^']+'")
string = rx.sub("x", string)
print(string)

# "Lorem x ipsum."
# "Lorem x ipsum hello_kitty.cmd?command92"

Answer 3

只需使用非贪婪的正则表达式：

import re
teststring = "Lorem 'hello_kitty.cmd?command91' ipsum hello_kitty.cmd?command92"
print(re.sub(r"'.*?'", "'x'", teststring)

返回Lorem 'x' ipsum hello_kitty.cmd?command9

正则表达式'.*?'匹配单引号之间的所有内容，但采用最短的字符串。

正则表达式 - 选择单引号之间的表达式

3 个答案: