以下情况的xpath是什么
我试图让它成为sais的一部分:
"Here is my text that I would like to get out of this. Bla Bla..."
正如您在HTML代码中可以看到的那样,本文位于-tag
的最后我试过了:
xpath = "/article()[last()]"
但这不起作用!
我试过了:
xpath = "//*[contains(@itemtype, 'http://schema.org/Article')]"
但是这也不起作用......
我想,问题是某种程度上HTML代码中还有其他标签,或者我一般做错了...
以下是HTML代码:
<div class="cbox"><article class="cf" itemscope itemtype="http://schema.org/Article">
<header>
<h1 itemprop="headline">Anzündhilfen: So bringen Sie die Kohle zur Weissglut</h1>
<em class="date">
<span class="my-color" itemprop="publisher">MyMagzine</span> 09/2018 vom <time datetime="2018-05-08" itemprop="datePublished">8. Mai 2018</time> | aktualisiert am <time datetime="2018-05-11" itemprop="dateModified">11. Mai 2018</time> </em>
<p>
von <span itemprop='author'>My Author</span> </p>
</header>
<p class="lead">Eine perfekte Glut ohne Rauch und Gestank bringen nur sogenannte Anzündkamine zustande. Aber zwei solche Produkte sind unsicher. </p>
<figure class="image-box cf" itemscope itemtype="http://schema.org/ImageObject">
<img src="/image/?m=Artikel&rid=1113094&attr=bild&thumb=thumb_yRsBeq_resize_300_200.png" alt="Funken sprühen (Bild: CHRISTIAN BIRMELE)" itemprop="contentUrl">
<figcaption>
<p itemprop="description">Funken sprühen (Bild: CHRISTIAN BIRMELE)</p>
</figcaption>
</figure>
Here is my text that I would like to get out of this. Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla
Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla Bla.
<br /> <br />My Magazine has this title inbetween
<br /> <br />Here is more text I also want to get our of this. [...]</p>
</article>
答案 0 :(得分:2)
尝试使用以下XPath获取所需文本
//article/figure/following-sibling::text()