HTML标记末尾的文本的xpath

时间:2018-05-11 13:30:37

标签: html xpath tags

以下情况的xpath是什么

我试图让它成为sais的一部分:

"Here is my text that I would like to get out of this. Bla Bla..."

正如您在HTML代码中可以看到的那样,本文位于-tag

的最后

我试过了:

xpath = "/article()[last()]"

但这不起作用!

我试过了:

xpath = "//*[contains(@itemtype, 'http://schema.org/Article')]"

但是这也不起作用......

我想,问题是某种程度上HTML代码中还有其他标签,或者我一般做错了...

以下是HTML代码:

<div class="cbox"><article class="cf" itemscope itemtype="http://schema.org/Article">
<header>
<h1 itemprop="headline">Anzündhilfen: So bringen Sie die Kohle zur Weissglut</h1>
<em class="date">
<span class="my-color" itemprop="publisher">MyMagzine</span> 09/2018 vom <time datetime="2018-05-08" itemprop="datePublished">8. Mai 2018</time> | aktualisiert am <time datetime="2018-05-11" itemprop="dateModified">11. Mai 2018</time> </em>
<p>
von <span itemprop='author'>My Author</span> </p>
</header>
<p class="lead">Eine perfekte Glut ohne Rauch und Gestank bringen nur sogenannte Anzündkamine ­zustande. Aber zwei solche Produkte sind ­unsicher. </p>
<figure class="image-box cf" itemscope itemtype="http://schema.org/ImageObject">
<img src="/image/?m=Artikel&rid=1113094&attr=bild&thumb=thumb_yRsBeq_resize_300_200.png" alt="Funken sprühen  (Bild: CHRISTIAN BIRMELE)" itemprop="contentUrl">
<figcaption>
<p itemprop="description">Funken sprühen (Bild: CHRISTIAN BIRMELE)</p>
</figcaption>
</figure>
Here is my text that I would like to get out of this. Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla.&nbsp;
<br /> <br />My Magazine has this title inbetween&nbsp;
<br /> <br />Here is more text I also want to get our of this. [...]</p>
</article>

1 个答案:

答案 0 :(得分:2)

尝试使用以下XPath获取所需文本

//article/figure/following-sibling::text()