XPATH:如果有一定数量的元素假设"电话"然后得到它的兄弟价值

时间:2014-04-04 07:42:10

标签: python xpath scrapy

我有以下案例

...
...

<tr>
    <td class="company-info">Phone:</td>
    <td> "020 641512" <span class="provider">ABC</span></td>
</tr>
....

我希望

  • 如果<td>的值为Phone:,则从下一个020 641512
  • 获取电话号码(<td>

我想象过这样的事情

phone = hxs.xpath("//td/text()[contains('Phone:')]", "Not available")

3 个答案:

答案 0 :(得分:1)

我认为你需要:

//td[contains(., 'Phone:')]/following-sibling::td/substring-before(substring-after(normalize-space(text()[1]), '&quot;'), '&quot;')

上面的表达式适用于Xquery,如果它不起作用,请尝试

//td[contains(., 'Phone:')]/following-sibling::td/text()[1]

输出[space]"020 641512"

答案 1 :(得分:1)

使用sc SelectorSelectorList,你可以use regular expressions via their .re() method

>>> hxs.xpath('//td[contains(., "Phone")]/following-sibling::td[1]').re(r'(\d[\d ]+\d)')
[u'020 641512']
>>> 

替代使用新的CSS选择器:

>>> from scrapy.selector import Selector
>>> selector = Selector(response)
>>> selector.css('td:contains("Phone") + td').re(r'(\d[\d ]+\d)')
[u'020 641512']
>>> 

答案 2 :(得分:-1)

还有一个非常有用的Firefox插件来找出名为Firebug的xpath,看看这些instructions