我正在尝试抓取优惠券网站优惠券,但是当我的时候 试图运行爬虫它的显示错误。请帮助。 感谢。
import scrapy
from scrapy.http import Request
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider
class CuponationSpider(scrapy.spider):
name = "cupo"
allowed_domains = ["cuponation.in"]
start_urls = ["https://www.cuponation.in/firstcry-coupon#voucher"]
def parse(self, response):
all_items = []
divs_action = response.xpath('//div[@class="action"]')
for div_action in divs_action:
item = VoucherItem()
span0 = div_action.xpath('./span[@data-voucher-id]')[0]
item['voucher_id'] = span0.xpath('./@data-voucher-
id').extract()[0]
item['code'] = span0.xpath('./span[@class="code-
field"]/text()').extract()[0]
all_items.append(item)
>**Output** ERROR
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)URLError: <urlopen error timed out>
2017-07-25 16:36:59 [boto] ERROR: Unable to read instance data, giving
up
答案 0 :(得分:0)
评论:...告诉我我正在做的错误
删除所有import
行,仅使用 :
import scrapy
您的类继承应该是:
class CuponationSpider(scrapy.Spider):
您已更改name
和starturl
,请使用:
name = "cuponation"
allowed_domains = ['cuponation.in']
start_urls = ['https://www.cuponation.in/firstcry-coupon']
Scrapy
。这可能是不同的
错误:无法读取实例数据,提供,告诉您没有从给定的网址接收任何数据。也许你被列入黑名单。评论:网址为cuponation.in/firstcry-coupon#voucher
相同页面无需重新加载。
所有这些都可以简化为以下内容:
all_items = []
def parse(self, response):
# Get all DIV with class="action"
divs_action = response.xpath('//div[@class="action"]')
for div_action in divs_action:
item = VoucherItem()
# Get SPAN from DIV with Attribute data-voucher-id
span0 = div_action.xpath('./span[@data-voucher-id]')[0]
# Copy Attribute voucher_id
item['voucher_id'] = span0.xpath('./@data-voucher-id').extract()[0]
# Find SPAN class="code-field" inside span0 and copy Text
item['code'] = span0.xpath('./span[@class="code-field"]/text()').extract()[0]
all_items.append(item)
输出:
#CouponSpider.start_requests:https://www.cuponation.in/firstcry-coupon #CouponSpider.parse() #CouponSpider.divs_action:List[13] of <Element div at 0xf6b1c20c> {'voucher_id': '868600', 'code': '*******'} {'voucher_id': '31793', 'code': '*******'} {'voucher_id': '832408', 'code': '*******'} {'voucher_id': '819903', 'code': '*******'} {'voucher_id': '808774', 'code': '*******'} {'voucher_id': '32274', 'code': '*******'} {'voucher_id': '32102', 'code': '*******'} {'voucher_id': '844247', 'code': '*******'} {'voucher_id': '843513', 'code': '*******'} {'voucher_id': '848151', 'code': '*******'} {'voucher_id': '845248', 'code': '*******'} {'voucher_id': '869101', 'code': '*******'} {'voucher_id': '869328', 'code': '*******'}