Question

我在搜索html源代码的python的regex findall语句中遇到了使用“的问题。

我正在搜索一些html源代码，但似乎无法在引文（findall）语句中使用引号（“）。由于某些无法更改的要求，我无法使用外部库（例如beautifulsoup）来帮助搜索我已经将变量名更改为搜索。

from re import *

def suncorp_find():

    # Setup to find information
    suncorp_file = open('suncorp.html')
    contents_suncorp = suncorp_file.read()

    # Search the HTMl files to find the data
    suncorp_titles = findall(r"\"event-title\">(\w )+", contents_suncorp)

    print(suncorp_titles)

suncorp_find()

我希望得到一个列表，其中包含项目，但我只是得到一个空列表。当仅搜索事件标题时，我会获得带有search_titles列表的多个项目。

在此先感谢您的帮助

<h6 class="event-title">Queensland Reds v Jaguares</h6>

Answer 1

使用此正则表达式：

suncorp_titles = findall(r"\"event-title\">(\w.*?)<", contents_suncorp)

或者为什么不低于？我已删除\w支票。我不知道您是否真的需要它。

suncorp_titles = findall(r"\"event-title\">(.*?)<", contents_suncorp)

我接受了输入：

<h6 class="event-title">Queensland Reds v Jaguares</h6>
<h6 class="event-title">testing line two</h6>

输出：

['Queensland Reds v Jaguares', 'testing line two']

Answer 2

您应该引用"符号。

from re import findall

tmp = """<some_tag name="event-title">Some text 1</some-tag>
<some_tag name="event-title">Some text 2</some-tag>
<some_tag name="event-title">Some text 3</some-tag>"""

result = findall("\"event-title\">([\w ]+)", tmp)

输出：

['Some text 1', 'Some text 2', 'Some text 3']

P.S。我建议您使用regex test website来验证您的表达式。

在findall语句中发出“

2 个答案: