我正在从其他网站上取消工作。源网站有不同的情况,因为用户复制粘贴数据和结构更改。
案例1:
<h3>Job Description</h3>
<div style="text-align: justify; line-height: 115%"><b>
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.</div>
案例2:
<h3>Job Description</h3>
<p>
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.</p>
在这种情况下,p标签有时会替换其他html标签。
案例3:
<h3>Job Description</h3>
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.
我正在使用此字符串来获取内容。现在适用于案例3,但不适用于其他两种情况。如何解决这三种情况的问题。
//text()[preceding::h3[text()="Job Description"]
答案 0 :(得分:0)
您的XPath表达式选择前面带有<h3>
且文本节点等于&#34;作业描述&#34;的文本节点。这仅与第三种情况相符,因为前两种情况分别在<div>
之后有<p>
和<h3>
。
您可以尝试这样的事情:
//node()[preceding-sibling::*[1][self::h3 = "Job Description"]]/string()
一些细节:
//node()
从初始上下文中选择所有元素或文本节点后代。
preceding-sibling::*[1]
选择前面的第一个元素。
[self::h3 = "Job Description"]
检查元素是<h3>
,并且其字符串值等于&#34;作业描述&#34;。
/string()
返回上下文节点的字符串值。对于您的示例内容,可以使用/descendant-or-self::text()
。它的工作原理是选择上下文节点(如果它是文本节点),以及所有后代文本节点(如果它是元素)。但是,如果将<div>
或<p>
更改为具有混合内容(即插入文本节点的子元素),则该表达式将返回后代文本节点的序列,而/string()
将它们连接在一起。