Question

我正在从其他网站上取消工作。源网站有不同的情况，因为用户复制粘贴数据和结构更改。

案例1：

<h3>Job Description</h3>
<div style="text-align: justify; line-height: 115%"><b>
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.</div>

案例2：

<h3>Job Description</h3>
<p>
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.</p>

在这种情况下，p标签有时会替换其他html标签。

案例3：

<h3>Job Description</h3>
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.

我正在使用此字符串来获取内容。现在适用于案例3，但不适用于其他两种情况。如何解决这三种情况的问题。

//text()[preceding::h3[text()="Job Description"]

Answer 1

您的XPath表达式选择前面带有<h3>且文本节点等于＆＃34;作业描述＆＃34;的文本节点。这仅与第三种情况相符，因为前两种情况分别在<div>之后有<p>和<h3>。

您可以尝试这样的事情：

//node()[preceding-sibling::*[1][self::h3 = "Job Description"]]/string()

一些细节：

//node()从初始上下文中选择所有元素或文本节点后代。

preceding-sibling::*[1]选择前面的第一个元素。

[self::h3 = "Job Description"]检查元素是<h3>，并且其字符串值等于＆＃34;作业描述＆＃34;。

/string()返回上下文节点的字符串值。对于您的示例内容，可以使用/descendant-or-self::text()。它的工作原理是选择上下文节点（如果它是文本节点），以及所有后代文本节点（如果它是元素）。但是，如果将<div>或<p>更改为具有混合内容（即插入文本节点的子元素），则该表达式将返回后代文本节点的序列，而/string()将它们连接在一起。

Xpath联合多个查询

1 个答案: