Question

传递https://regex101.com/没有任何问题。我错过了什么吗？整个字符串都在一行中。

def get_title_and_content(html):
  html = """<!DOCTYPE html>     <html>       <head>       <title>Change delivery date with Deliv</title>       </head>       <body>       <div class="gkms web">The delivery date can be changed up until the package is assigned to a driver.</div>       </body>     </html>  """
  title_pattern = re.compile(r'<title>(.*?)</title>(.*)')
  match = title_pattern.match(html)
  if match:
    print('successfully extract title and answer')
      return match.groups()[0].strip(), match.groups()[1].strip()
    else:
      print('unable to extract title or answer')

Answer 1

在评论摘要中：

title_pattern.search(html)应该使用而不是title_pattern.match(html)

由于搜索功能将在提供的字符串中的任何位置搜索，而不是从头开始搜索。 match = title_pattern.findall(html)可以类似地使用，但会返回一个项目列表而不是一个。

另外正如使用BeautifulSoup提到的那样，从长远来看，由于正则表达式不适合搜索HTML，所以会付出更多费用

Answer 2

评论是正确的，re.match（）从头开始搜索。话虽如此，在你的正则表达式中插入一个*。从头开始搜索：

Same aliases

Python正则表达式匹配失败

2 个答案: