我刮擦了html成功但飞溅的声音。
不知道为什么爬网失败。
下面使用飞溅的代码,
import scrapy
from scrapy_splash import SplashRequest
class Spidernav(scrapy.Spider):
name = "navarea"
def start_requests(self):
urls = [
'http://www1.kaiho.mlit.go.jp/TUHO/keiho/navarea11_en.html?fbclid=IwAR0NCPNZb0esQcqHL9nWPt9NaB9FaKhRU769_sdiUfsOJY8Rf-rOUmkFAWA'
]
splash_args = {'wait': 0.5}
for url in urls:
yield SplashRequest(url=url, callback=self.parse, args=splash_args, endpoint='render.html')
def parse(self, response):
logging.info('done')
# filename = 'navarea.html'
# with open(filename, 'wb') as f:
# f.write(response)
# self.log('Saved file %s' % filename)
但是它总是卡在这里没继续:
2018-12-15 12:57:18 [scrapy.core.engine] INFO: Spider opened
2018-12-15 12:57:18 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-15 12:57:18 [navarea] INFO: Spider opened: navarea
2018-12-15 12:57:18 [navarea] INFO: Spider opened: navarea
2018-12-15 12:57:18 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6032
几分钟后,它显示在后面:
2018-12-15 12:58:18 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-15 12:58:33 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www1.kaiho.mlit.go.jp/TUHO/keiho/navarea11_en.html?fbclid=IwAR0NCPNZb0esQcqHL9nWPt9NaB9FaKhRU769_sdiUfsOJY8Rf-rOUmkFAWA via http://192.168.203.92:8050/render.html> (failed 1 times): TCP connection timed out: 60: Operation timed out.