Question

我有一个关于提取html行标题的问题。

让我们说我的路线是：

<span class="title_name"> <a href="/?id=2124">Fairwood</a></span>

和lol，我不得不为该行添加一些额外的空格，以便不显示为超链接..

我将如何自动提取“Fairwood”，给出了许多格式相似的行，具有不同的id和标题。

提前致谢

Answer 1

解析器解决方案有什么问题？

import xml.etree.ElementTree as ET
root = ET.fromstring('<span class="title_name"> <a href="/?id=2124">Fairwood</a></span>')
print(root.find("a").text)
# Fairwood

Answer 2

如果格式相似，则可以尝试：

import re 
html='''
<span class="title_name1"> <a href="/?id=2124">Fairwood1</a></span>
<span class="title_name2"> <a href="/?id=2125">Fairwood2</a></span>'''
print re.findall(r'\w+(?=</a></span>)',html,re.M)

提取我的html行的标题

2 个答案: