Question

这是我的字符串：

content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'

我尝试过以下正则表达式来提取h5元素标记之间的文本：

   reg = re.search(r'<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>([A-Za-z0-9%s]+)</h5></span></td></tr>' % string.punctuation,content)

这正是我想要的回报。

有没有更多的pythonic方式来获得这个？

Answer 1

Dunno这是否更符合pythonic，但它将其作为HTML数据处理。

from lxml import html
content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'
HtmlData = html.fromstring(content)
ListData = HtmlData.xpath(‘//text()’)

获得最后一个元素：

ListData[-1]

如何从html表行中提取文本

1 个答案: