XPath选择除某些部分之外的文本

时间:2015-10-30 20:06:02

标签: html xml xpath xpath-2.0

如果我想提取“TEXT 1”但“TEXT 2”和“TEXT 3”,我该如何编写XPath表达式?

<div class="content">
    <div>
        <p>
TEXT 1 <span class="author"> TEXT 2</span>
     <a href="http://www.example.com" class="more" name="_chf_A_xxlformat_">TEXT 3</a>
    </p>
</div>
</div>

3 个答案:

答案 0 :(得分:1)

试试这个:

java.lang.ExceptionInInitializerError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at fit.FixtureLoader.loadFixtureClass(FixtureLoader.java:67)
at fit.FixtureLoader.instantiateFixture(FixtureLoader.java:60)
at fit.FixtureLoader.instantiateFirstValidFixtureClass(FixtureLoader.java:84)
at fit.FixtureLoader.disgraceThenLoad(FixtureLoader.java:44)
at fit.Fixture.loadFixture(Fixture.java:142)
at fit.Fixture.getLinkedFixtureWithArgs(Fixture.java:134)
at fit.Fixture.interpretFollowingTables(Fixture.java:120)
at fit.Fixture.interpretTables(Fixture.java:107)
at fit.Fixture.doTables(Fixture.java:81)
at fit.FitServer.process(FitServer.java:81)
at fit.FitServer.run(FitServer.java:56)
at fit.FitServer.main(FitServer.java:41)
Caused by: org.openqa.selenium.WebDriverException: Failed to connect to     binary FirefoxBinary(C:\Program Files\Mozilla Firefox\firefox_binary.exe) on port 7055; process output follows: 
null
Build info: version: '2.48.2', revision: '41bccdd', time: '2015-10-09 19:59:12'
System info: host: 'HQSWL-C008398', ip: '172.24.117.170', os.name: 'Windows 7', os.arch: 'amd64', os.version: '6.1', java.version: '1.8.0_45'
Driver info: driver.version: FirefoxDriver
at org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:138)
at org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:271)
at org.openqa.selenium.remote.RemoteWebDriver.(RemoteWebDriver.java:117)
at org.openqa.selenium.firefox.FirefoxDriver.(FirefoxDriver.java:218)
at org.openqa.selenium.firefox.FirefoxDriver.(FirefoxDriver.java:211)
at org.openqa.selenium.firefox.FirefoxDriver.(FirefoxDriver.java:207)
at com.comcast.app.seleniumui.slingbox.SlingBoxUiLoginForFirefox.getBrowserCode(SlingBoxUiLoginForFirefox.java:46)
at com.comcast.app.seleniumui.slingbox.SlingBoxUiLoginForFirefox.(SlingBoxUiLoginForFirefox.java:31)
... 14 more
Caused by: org.openqa.selenium.WebDriverException: java.io.FileNotFoundException: C:\Users\{user}\AppData\Local\Temp\unzip3974297545324944844stream\install.rdf (The system cannot find the file specified)
Build info: version: '2.48.2', revision: '41bccdd', time: '2015-10-09 19:59:12'
System info: host: 'HQSWL-C008398', ip: '172.24.117.170', os.name: 'Windows 7', os.arch: 'amd64', os.version: '6.1', java.version: '1.8.0_45'
Driver info: driver.version: FirefoxDriver
at org.openqa.selenium.firefox.internal.FileExtension.readIdFromInstallRdf(FileExtension.java:142)
at org.openqa.selenium.firefox.internal.FileExtension.writeTo(FileExtension.java:61)
at org.openqa.selenium.firefox.FirefoxProfile.installExtensions(FirefoxProfile.java:443)
at org.openqa.selenium.firefox.FirefoxProfile.layoutOnDisk(FirefoxProfile.java:421)
at org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:95)
... 21 more
Caused by: java.io.FileNotFoundException:      C:\Users\{user}\AppData\Local\Temp\unzip3974297545324944844stream\install.rdf (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
at org.openqa.selenium.firefox.internal.FileExtension.readIdFromInstallRdf(FileExtension.java:98)
... 25 more

您可能执行了<xsl:value-of select="text()"/> ,它接受​​当前节点并将其递归转换为文本。 <xsl:value-of select="."/>仅选择文本节点,不包括子元素和属性。

这是一个完整的上下文XSLT:

text()

答案 1 :(得分:1)

试试这个XPath:

$x("(//div[@class='content']/div/p/text())[1]");

也许这不是很安静,但它似乎做了它的工作:) 请注意,[1]将首次出现文本,如果更改文本位置,它将无法正常工作。

此致 安德烈。

答案 2 :(得分:1)

此XPath将选择作为p

的直接子项的文本节点
//div[@class='content']/div/p/text()

因此将排除“TEXT 2”和“TEXT 3”。

你可能更喜欢消除前导和尾随空格(并替换重复的内部空格,但无论如何):

//div[@class='content']/div/p/text()[normalize-space()]

在XPath 1.0和XPath 2.0中评估为“TEXT 1”。