应用错误收集

使用scrapy爬行块中的属性

时间：2014-04-18 16:49:03

标签： python html web-scraping web-crawler scrapy

我使用scrapy抓取此链接：

<input class="xxxmail" type="text" readonly="readonly" value="xxx.org">

我只需要＆＃34; xxx.org＆＃34;。我该如何检索它？

1 个答案:

答案 0 :(得分：1)

您可以使用以下xpath表达式：

//input[@class="xxxmail"]/@value

这将获得value标记的input属性，其中包含＆＃34; xxxmail＆＃34; class。

在蜘蛛中，您应首先从xpath实例化Selector然后extract()：

sel = Selector(response)
print sel.xpath('//input[@class="xxxmail"]/@value').extract()