Question

我正在尝试使用正则表达式来仅选择引号内的单词组。

示例

输入：

this is 'a sentence' with less 'than twenty words'

输出：

['a sentence', 'than twenty words']

我正在使用的正则表达式是：

'\'[\w]+[ ]+[[\w]+[ ]+]*[\w]+\''

但它只是回归'超过二十个字'。实际上，它只返回带有两个空格的字符串。

Answer 1

试试这个：

import re
re.findall(r"\'(\s*\w+\s+\w[\s\w]*)\'", input_string)

Demo

Answer 2

试试这段代码

import re
st = "this is 'a sentence' with less 'than twenty words'"
re.findall(r"\'([\w|\s]+)\'", st)

Answer 3

import re 
sentence = "this is 'a sentence' with less 'than twenty words' and a 'lonely' word"
regex = re.compile(r"(?<=')\w+(?:\s+\w+)+(?=')")
regex.findall(sentence)
# ['a sentence', 'than twenty words']

我们希望捕获以引号开头和结尾的字符串，而不捕获它们，因此我们之前使用正向lookbehind断言(?<=')，之后使用先行断言(?=')。

在引号内，我们希望至少有一个单词，后跟至少一组空格和单词。我们不希望它成为捕获组，否则findall将仅返回此组，因此我们使用(?:....)使其无法捕获。

Answer 4

迟到的答案，但您可以使用：

import re
string = "this is 'a sentence' with less 'than twenty words'"
result = re.findall("'(.*?)'", string)
print result
# ['a sentence', 'than twenty words']

Python Demo
Regex Demo

Python的正则表达式星形量词未按预期工作

4 个答案: