Question

我正在尝试抓取优惠券网站优惠券，但是当我的时候试图运行爬虫它的显示错误。请帮助。感谢。

import scrapy
from scrapy.http import Request
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider
class CuponationSpider(scrapy.spider):
   name = "cupo"
   allowed_domains = ["cuponation.in"]
   start_urls = ["https://www.cuponation.in/firstcry-coupon#voucher"]
   def parse(self, response):
      all_items = []
      divs_action = response.xpath('//div[@class="action"]')
      for div_action in divs_action:
         item = VoucherItem()
         span0 = div_action.xpath('./span[@data-voucher-id]')[0]
         item['voucher_id'] = span0.xpath('./@data-voucher-
                  id').extract()[0]
         item['code'] = span0.xpath('./span[@class="code-
               field"]/text()').extract()[0]
         all_items.append(item)





   >**Output** ERROR  
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)URLError: <urlopen error timed out>
2017-07-25 16:36:59 [boto] ERROR: Unable to read instance data, giving 
 up

Answer 1

评论：...告诉我我正在做的错误

删除所有import行，仅使用：
```
import scrapy
```
您的类继承应该是：
```
class CuponationSpider(scrapy.Spider):
```

您已更改name和starturl，请使用：

name = "cuponation"
allowed_domains = ['cuponation.in']
start_urls = ['https://www.cuponation.in/firstcry-coupon']

您使用 Python 2.7
抱歉，无法使用 2.7 运行Scrapy。这可能是不同的 错误：无法读取实例数据，提供，告诉您没有从给定的网址接收任何数据。也许你被列入黑名单。

评论：网址为cuponation.in/firstcry-coupon#voucher

相同页面无需重新加载。
所有这些都可以简化为以下内容：

all_items = []

def parse(self, response):
    # Get all DIV with class="action"
    divs_action = response.xpath('//div[@class="action"]')

    for div_action in divs_action:
        item = VoucherItem()

        # Get SPAN from DIV with Attribute data-voucher-id
        span0 = div_action.xpath('./span[@data-voucher-id]')[0]

        # Copy Attribute voucher_id
        item['voucher_id'] = span0.xpath('./@data-voucher-id').extract()[0]

        # Find SPAN class="code-field" inside span0 and copy Text
        item['code'] = span0.xpath('./span[@class="code-field"]/text()').extract()[0]

        all_items.append(item)

输出：

#CouponSpider.start_requests:https://www.cuponation.in/firstcry-coupon
#CouponSpider.parse()
#CouponSpider.divs_action:List[13] of <Element div at 0xf6b1c20c>
{'voucher_id': '868600', 'code': '*******'}
{'voucher_id': '31793', 'code': '*******'}
{'voucher_id': '832408', 'code': '*******'}
{'voucher_id': '819903', 'code': '*******'}
{'voucher_id': '808774', 'code': '*******'}
{'voucher_id': '32274', 'code': '*******'}
{'voucher_id': '32102', 'code': '*******'}
{'voucher_id': '844247', 'code': '*******'}
{'voucher_id': '843513', 'code': '*******'}
{'voucher_id': '848151', 'code': '*******'}
{'voucher_id': '845248', 'code': '*******'}
{'voucher_id': '869101', 'code': '*******'}
{'voucher_id': '869328', 'code': '*******'}

scrapy爬虫在爬行时显示错误

1 个答案: