我有以下html标签结构。
<?xml version="1.0" encoding="UTF-8"?><div class="MainContentTextContainer">
<br/> To break the stalemate of heavy competition and low growth in the traditional orthopedic implant markets, the major orthopedic companies are turning to biologics. This white paper provides some information on this trend. Information from this White Paper was obtained from Kalorama's full market study on this market 'The World Market for Orthopedic Biomaterials SKU KLI6329663," as well as news media sources. <br/>
<br/>
<p> "We also feature department and global pricing for reports that we be utilized by more than one user at your company."
</p>
<p>
<b>Related Reports:</b>
</p>
<!-- [PID:6921310] -->
<a href="http://www.kaloramainformation.com/Global-Medical-Devices-6921310/" class="StandardLink DkBlueType">The Global Market for Medical Devices, 3rd. Edition</a>
<br/>May 2, 2012 - KLI3873247 - $1,995.00<br/>
<br/>
</div>
上面是我的html节点结构。我想返回MainContentTextContainer节点,该节点在包含'Related Reports:'的最后<p>
标签之后不包含节点。
表示我希望输出为:
<div class="MainContentTextContainer">
<br/> To break the stalemate of heavy competition and low growth in the traditional orthopedic implant markets, the major orthopedic companies are turning to biologics. This white paper provides some information on this trend. Information from this White Paper was obtained from Kalorama's full market study on this market 'The World Market for Orthopedic Biomaterials SKU KLI6329663," as well as news media sources. <br/>
<br/>
<p> "We also feature department and global pricing for reports that we be utilized by more than one user at your company."
</p>
</div>
我使用了以下xpath:
//div[@class='MainContentTextContainer']/*[not(self::p[last()])]
但它不起作用。 请指导我使用正确的xpath。
感谢。
答案 0 :(得分:0)
此XPath表达式返回div
内的所有文本和元素节点,后面跟着p
,这意味着最后p
及其后面的节点不包括在内:< / p>
(//div[@class='MainContentTextContainer']/* | //div[@class='MainContentTextContainer']/text())[following-sibling::p]