Question

我有以下案例

...
...

<tr>
    <td class="company-info">Phone:</td>
    <td> "020 641512" <span class="provider">ABC</span></td>
</tr>
....

我希望

如果<td>的值为Phone:，则从下一个020 641512

<td>

我想象过这样的事情

phone = hxs.xpath("//td/text()[contains('Phone:')]", "Not available")

Answer 1

我认为你需要：

//td[contains(., 'Phone:')]/following-sibling::td/substring-before(substring-after(normalize-space(text()[1]), '&quot;'), '&quot;')

上面的表达式适用于Xquery，如果它不起作用，请尝试

//td[contains(., 'Phone:')]/following-sibling::td/text()[1]

输出[space]"020 641512"

Answer 2

使用sc Selector和SelectorList，你可以use regular expressions via their .re() method：

>>> hxs.xpath('//td[contains(., "Phone")]/following-sibling::td[1]').re(r'(\d[\d ]+\d)')
[u'020 641512']
>>>

替代使用新的CSS选择器：

>>> from scrapy.selector import Selector
>>> selector = Selector(response)
>>> selector.css('td:contains("Phone") + td').re(r'(\d[\d ]+\d)')
[u'020 641512']
>>>

Answer 3

还有一个非常有用的Firefox插件来找出名为Firebug的xpath，看看这些instructions。

XPATH：如果有一定数量的元素假设＆＃34;电话＆＃34;然后得到它的兄弟价值

3 个答案: