Question

在下面的脚本中，我想在双引号（“）之间提取文本。但是，python解释器不开心，我无法弄清楚为什么......

import re

text = 'Hello, "find.me-_/\\" please help with python regex'
pattern = r'"([A-Za-z0-9_\./\\-]*)"'
m = re.match(pattern, text)

print m.group()

输出应为find.me-/\。

Answer 1

match从文字开头开始搜索。

改为使用search：

#!/usr/bin/env python

import re

text = 'Hello, "find.me-_/\\" please help with python regex'
pattern = r'"([A-Za-z0-9_\./\\-]*)"'
m = re.search(pattern, text)

print m.group()

match和search在匹配失败时返回None。

我猜你从python那里得到AttributeError: 'NoneType' object has no attribute 'group'：这是因为你假设你将匹配而不检查re.match的回报。

Answer 2

使用re.search()代替re.match()。后者只匹配字符串开头的（如隐式^）。

Answer 3

你需要re.search()，而不是re.match（）`，它固定在输入字符串的开头。

文档here

Answer 4

如果你写：

m = re.search(pattern, text)

匹配：在文字开头搜索

搜索：搜索所有字符串

也许这可以帮助您理解： http://docs.python.org/library/re.html#matching-vs-searching

Answer 5

您可以代替regex

def text_between_quotes(text):
    between_quotes = text.split('"')[1::2]
    # if you have an odd number of quotes (ie. the quotes are unbalanced), 
    # discard the last element
    if len(between_quotes) % 2 == 0 and not text.endswith('"'):
        return between_quotes[:-1]
    return between_quotes

将文本拆分为引号，所有奇数索引都在两个引号之间

my_string = 'Hello, "find.me-_/\\" please help and "this quote" here'
my_string.split('"')
my_string.split('"')[1::2] # ['find.me-_/\\', 'this quote']

，但是您仍然需要确保引号没有不平衡（例如，您的文本包含3个"），因此，如果在执行split之后，您有偶数个数字，您需要舍弃最后一项，即if语句正在执行的操作。

这假设您在引号内没有引号，并且您的文本没有混合引号或使用fancy quotes。

Python正则表达式匹配引号之间的文本

5 个答案: