Question

我正在使用Selenium和Python，我想在hr标签之前选择html。这是我的代码：

<div id="wikipage">
<div id="wikipage-inner">
<h1>Berkeley</h1>
<p><span><strong>Title1</strong></span></p>
<p><strong>Address: </strong>..</p>
<p><strong>Website: </strong><a href="..">..</a></p>
<p><strong>Phone: </strong>..</p>

<hr />

<p><strong><span">Title2</span></strong></p>
<p><strong>Address: </strong>..</p>
<p><strong>Website:</strong> <a href="..">..</a></p>
<p><strong>Phone:</strong> ..</p>
<p><strong>Email:</strong> <a href="mailto:..">..</a></p>

<hr />
</div>
</div>

我正在使用正则表达式将title-address-website-phone-email ..解压缩到csv文件中，因此我需要整个网页中每个hr标记之前的文本。结果将是一个列表，类似这样的

This is a text before hr: Title1 Adress: .. Website: .. Phone: ..
This is a text before hr: Title2 Adress ..

写作时：

for p in parag:
    print('This is a text before hr: ', p.text)

我会很感激一些帮助。

Answer 1

如果您有``个固定节点，可以尝试使用此xpath：

//hr[x]/preceding-sibling::p[position()<=y]

x 是<hr/>标记的位置， y 是之前<hr/>标记的数量

因此，例如，如果我想在第二个之前选择所有5个<hr/>节点，我将使用此xpath：

//hr[2]/preceding-sibling::p[position()<=5]

如果您没有固定数量的``标签，则必须使用更复杂的xpath：

//hr[x]/preceding-sibling::p[position()<=count(//hr[x]/preceding-sibling::p) - count(//hr[y]/preceding-sibling::p)]

x 是底部<hr/>标记的位置， y 是顶部<hr/>标记的位置。

因此，要选择与我在第一个示例中选择的相同的节点，您必须使用此xpath：

//hr[2]/preceding-sibling::p[position()<=count(//hr[2]/preceding-sibling::p) - count(//hr[1]/preceding-sibling::p)]

我选择了和第二<hr/>之间的所有<hr/>代码

选择previous / following-sibling XPath

1 个答案:

如果您有`<p>`个固定节点，可以尝试使用此xpath：

如果您没有固定数量的`<p>`标签，则必须使用更复杂的xpath：

选择previous / following-sibling XPath

1 个答案:

如果您有<p>个固定节点，可以尝试使用此xpath：

如果您没有固定数量的<p>标签，则必须使用更复杂的xpath：

如果您有`<p>`个固定节点，可以尝试使用此xpath：

如果您没有固定数量的`<p>`标签，则必须使用更复杂的xpath：