Question

当我阅读文本时，我在文本的某些行中有<h3 class="heading">General Purpose</h3>之类的字符串，现在我想从上面只获得General Purpose的值..

d = re.search(re.escape('<h3 class="heading">')+"(.*?)"+re.escape('</h3>'), str(data2))
if d:
    print(d.group(0))

Answer 1

import re

text="""<h3 class="heading">General Purpose</h3>"""
pattern="(<.*?>)(.*)(<.*?>)"

g=re.search(pattern,text)
g.group(2)

输出：

'General Purpose'

如果它是一个美丽的汤对象，那么它更容易获得价值。你不需要正则表达式。

from bs4 import BeautifulSoup

text="""<h3 class="heading">General Purpose</h3>"""
a=BeautifulSoup(text)
print a.select('h3.heading')[0].text

输出：

General Purpose

Answer 2

组0包含整个匹配;你想要第1组的内容：

print(d.group(1))

但一般来说，使用正则表达式来解析HTML并不是一个好主意（尽管实际上，嵌套的h3标签应该是相当不常见的。）

Answer 3

警告：仅适用于python，而不是pcre或JS（JS不支持Lookbehind）。

(?<=\<\h3 class=\"heading\"\>).*?(?=\<\/h3\>)