Question

我正在尝试创建一个仅从特定正则表达式中提取两位数整数的函数。

def extract_number(message_text):
    regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    for match in matches:
        return match.group()

    # if there were no matches, return None
    return None

这样当我打印

时

message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))

我会得到54号。如果我在下面写下面的内容，我会得到我放入的任何字符（。+）...为什么它对数字不起作用？

def extract_number(message_text):
    regex_expression = 'What are the top (.+) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    for match in matches:
        return match.group()

message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))

Answer 1

两者你的片段的唯一问题是你没有返回感兴趣的捕获组结果，而是整体匹配：

return match.group()

与return match.group(0)相同，即它会返回整体匹配，在您的情况下是整个输入字符串。< / p>

相比之下，您需要索引1，即第一个捕获组 - (...)中包含的第一个子表达式，([0-9]{2}) - 匹配：< / p>

return match.group(1)

把它们放在一起：

def extract_number(message_text):
    regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    # (See bottom of this answer for a loop-less alternative.)
    for match in matches:
        return match.group(1)  # index 1 returns what the 1st capture group matched

    # if there were no matches, return None
    return None

message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))

这产生了所需的输出：

注意：正如@EvanL00指出的那样，假设只需要 1 匹配，那么使用regex.finditer()和后续for循环无条件地返回第一个迭代是不必要的，可能会模糊代码的意图;更简单明了的方法是：

match = regex.search(message_text) # Get first match only.
if match:
    return match.group(1)

Answer 2

这适用于数字/字符串：

def extract_number(message_text):
    regex_expression = 'What are the top ([a-zA-Z0-9]+) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.findall(message_text)
    if matches:
        return matches[0]

message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top 50 trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top -- trends on facebook today'
print(extract_number(message_text))

输出：

fifty
50
None

如何使用正则表达式从句子中提取两位数？

2 个答案: