scrapy爬虫在爬行时显示错误

时间:2017-07-14 11:54:56

标签: python xml xpath scrapy web-crawler

我正在尝试抓取优惠券网站优惠券,但是当我的时候      试图运行爬虫它的显示错误。请帮助。      感谢。

import scrapy
from scrapy.http import Request
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider
class CuponationSpider(scrapy.spider):
   name = "cupo"
   allowed_domains = ["cuponation.in"]
   start_urls = ["https://www.cuponation.in/firstcry-coupon#voucher"]
   def parse(self, response):
      all_items = []
      divs_action = response.xpath('//div[@class="action"]')
      for div_action in divs_action:
         item = VoucherItem()
         span0 = div_action.xpath('./span[@data-voucher-id]')[0]
         item['voucher_id'] = span0.xpath('./@data-voucher-
                  id').extract()[0]
         item['code'] = span0.xpath('./span[@class="code-
               field"]/text()').extract()[0]
         all_items.append(item)





   >**Output** ERROR  
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)URLError: <urlopen error timed out>
2017-07-25 16:36:59 [boto] ERROR: Unable to read instance data, giving 
 up

1 个答案:

答案 0 :(得分:0)

  

评论:...告诉我我正在做的错误

  1. 删除所有import行,仅使用

    import scrapy
    
  2. 您的类继承应该是:

    class CuponationSpider(scrapy.Spider):
    
  3. 您已更改namestarturl,请使用:

    name = "cuponation"
    allowed_domains = ['cuponation.in']
    start_urls = ['https://www.cuponation.in/firstcry-coupon']
    
  4. 您使用 Python 2.7
    抱歉,无法使用 2.7 运行Scrapy。这可能是不同的 错误:无法读取实例数据,提供,告诉您没有从给定的网址接收任何数据。也许你被列入黑名单。
  5.   

    评论:网址为cuponation.in/firstcry-coupon#voucher

    相同页面无需重新加载。
    所有这些都可以简化为以下内容:

    all_items = []
    
    def parse(self, response):
        # Get all DIV with class="action"
        divs_action = response.xpath('//div[@class="action"]')
    
        for div_action in divs_action:
            item = VoucherItem()
    
            # Get SPAN from DIV with Attribute data-voucher-id
            span0 = div_action.xpath('./span[@data-voucher-id]')[0]
    
            # Copy Attribute voucher_id
            item['voucher_id'] = span0.xpath('./@data-voucher-id').extract()[0]
    
            # Find SPAN class="code-field" inside span0 and copy Text
            item['code'] = span0.xpath('./span[@class="code-field"]/text()').extract()[0]
    
            all_items.append(item)
    
      

    输出

    #CouponSpider.start_requests:https://www.cuponation.in/firstcry-coupon
    #CouponSpider.parse()
    #CouponSpider.divs_action:List[13] of <Element div at 0xf6b1c20c>
    {'voucher_id': '868600', 'code': '*******'}
    {'voucher_id': '31793', 'code': '*******'}
    {'voucher_id': '832408', 'code': '*******'}
    {'voucher_id': '819903', 'code': '*******'}
    {'voucher_id': '808774', 'code': '*******'}
    {'voucher_id': '32274', 'code': '*******'}
    {'voucher_id': '32102', 'code': '*******'}
    {'voucher_id': '844247', 'code': '*******'}
    {'voucher_id': '843513', 'code': '*******'}
    {'voucher_id': '848151', 'code': '*******'}
    {'voucher_id': '845248', 'code': '*******'}
    {'voucher_id': '869101', 'code': '*******'}
    {'voucher_id': '869328', 'code': '*******'}            
    
相关问题