Question

如何从

中读取值

<th class="class_name"> Sample Text </th>

任何人都可以帮我从上面的HTML代码中使用python获取字符串“Sample Text”。

谢谢。

Answer 1

您可以使用BeautifulSoup这是我最喜欢的lib来解析html。

from BeautifulSoup import BeautifulSoup
html = '<th class="class_name"> Sample Text </th>'
soup = BeautifulSoup(html)
print soup.th.text

Answer 2

正则表达式解决方案：

import re

th_regex = re.compile(r'<th\s+class="class_name">(.*?)</th>')
search_result = th_regex.search(input_string)

print(search_result and search_result.group(1) or 'not found')

注意：您需要在?之后使用.*来使用非贪婪搜索，这会在</th>发生时停止播放字符。否则，您将获得整个字符串到input_string的末尾。

Answer 3

您可以使用minidom来解析它。不过，我不确定你的具体需求是什么。

from xml.dom import minidom
dom = minidom.parseString(html)
for elem in dom.getElementsByTagName('th'):
    if elem.getAttribute('class') == 'class_name':
        print elem.firstChild.nodeValue

Answer 4

正则表达式解决方案：

import re

s = '<th class="class_name"> Sample Text </th>'
data = re.findall('<th class="class_name">(.*?)</th>', s)
print data

从标签中获取价值

4 个答案: