通过“模式”过滤/排除xPath提取

时间:2014-01-09 12:40:54

标签: xpath

这是我必须要做的事情:

<div class="Pictures zoom">

<a title="Productname 1" class="zoomThumbActive" rel="{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">

<img title="Productname 1" src="/images/24.245/mini-doge-picture.jpeg" alt="" /></a>

<a title="Productname 1" rel="{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">

<img title="Productname 1" src="/images/22.999/this-picture-is-very-small.jpeg" alt="" /></a>

<div>

使用以下Xpath:

/html//div[@class='Pictures zoom']/a/@rel

输出变为:

{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}
{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}

是否有可能过滤提取,所以如上所述,我只能得到这些:

/images/76.24561/big-one-picture.jpeg
/images/9.5664/very-big-one-picture.jpeg

我只想保留largeimage: ''}

之间的所有内容

致以最诚挚的问候,

刘康

1 个答案:

答案 0 :(得分:1)

使用substring-beforesubstring-after剪切您不想要的部分。

使用 XPath 1.0 ,这只能针对单个结果执行(因此您无法通过单个XPath调用获取一个文档中包含的所有URL)。此查询将返回第一个URL:

substring-before(substring-after((//@rel)[1], "largeimage: '"), "'")

XPath 2.0 允许您将函数作为轴步骤运行。此查询将返回您要查找的所有网址作为单个令牌:

//@rel/substring-before(substring-after(., "largeimage: '"), "'")