Question

我想从字符串中提取某些内容。例如，字符串为：

s = "xxx text, yyy"
expected = "xxx text"

s = "xxx text yyy"
expected = "xxx text"

s = "xxx [text] yyy"
expected = "xxx [text]"

s = "xxx text,"
expected = "xxx text"

s = "xxx text "
expected = "xxx text"

我当前的代码是：

re.search(r'xxx \S+', s)

因此，在我的正则表达式中，我不能排除逗号','。我知道[^,]可以排除逗号，但是如何将其与\S结合使用。

对于我来说，我必须使用'\S'，我的要求只是排除基于\S的逗号。

我尝试了正则表达式断言：re.search(r'xxx (\S+(?!\,))', s).groups()，但仍提取了逗号。

Answer 1

有两种方法可以完成这项工作：

s="xxx text, yyy"
# if there is ALLWAYS a comma after.
res = re.search(r'xxx \S+(?=,)', s)
print(res.group())
# else
res = re.search(r'xxx [^\s,]+', s)
print(res.group())

根据新的测试用例进行更新：

ar = [
    "xxx text, yyy",
    "xxx text yyy",
    "xxx [text] yyy",
    "xxx text,",
    "xxx text ",
    "xxx text",
]
for s in ar:
    # choose one of them
    print(re.search(r'xxx \S+?(?=,|\s|$)', s).group())
    print(re.search(r'xxx [^\s,]+', s).group())
    print

输出：

xxx text
xxx text

xxx text
xxx text

xxx [text]
xxx [text]

xxx text
xxx text

xxx text
xxx text

xxx text
xxx text

Answer 2

您可以这样将\S替换为\w：

re.search(r'xxx \w+', s)

如何在正则表达式中仅排除一个字符？

2 个答案: