我有一个像这样的HTML文件:
<div id="note">
<a name="overview"></a>
<h3>Overview</h3>
<p>some text1...</p>
<a name="description"></a>
<h3>Description</h3>
<p>some text2 ...</p>
</div>
`
我想为每个标题检索段落。 例如,概述:一些text1 描述:一些文字2 ... 我想用xpath在python中写这个。 谢谢。
答案 0 :(得分:0)
找到所有h3
标记,迭代它们,然后在迭代循环的每一步中,找到下一个兄弟标记p
:
import urllib2
from lxml import etree
URL = "http://www.kb.cert.org/vuls/id/628463"
response = urllib2.urlopen(URL)
parser = etree.HTMLParser()
tree = etree.parse(response, parser)
for header in tree.iter('h3'):
paragraph = header.xpath('(.//following-sibling::p)[1]')
if paragraph:
print "%s: %s" % (header.text, paragraph[0].text)
打印:
Overview: The Ruby on Rails 3.0 and 2.3 JSON parser contain a vulnerability that may result in arbitrary code execution.
Description: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Impact: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Solution: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Vendor Information : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
CVSS Metrics : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
References: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Credit: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Feedback: If you have feedback, comments, or additional information about this vulnerability, please send us
Subscribe to Updates: Receive security alerts, tips, and other updates.