这是一段HTML,我想从中提取信息:
<li>
<p><strong class="more-details-section-header">Provenance</strong></p>
<p>Galerie Max Hetzler, Berlin<br>Acquired from the above by the present owner</p>
</li>
我希望有一个xpath表达式,它可以提取第二个<p> ... </p>
的内容,具体取决于<p> ... Provenance ... </p>
这是我到目前为止的地方:
if "Provenance" in response.xpath('//strong[@class="more-details-section-header"]/text()').extract():
print("provenance = yes")
但是如何进入Galerie Max Hetzler, Berlin<br>Acquired from the above by the present owner
?
我试过
if "Provenance" in response.xpath('//strong[@class="more-details-section-header"]/text()').extract():
print("provenance = yes ", response.xpath('//strong[@class="more-details-section-header"]/following-sibling::p').extract())
但是我得到了[]
答案 0 :(得分:1)
你应该使用
//p[preceding-sibling::p[1]/strong='Provenance']/text()