Question

我正在尝试使用Scrapy和Python从我公司的IT和网络中抓取一些页面。我从这里开始使用scrapy教程https://doc.scrapy.org/en/latest/intro/tutorial.html

当我尝试与教程页面上的代码相同的代码时，我收到错误：

2018-01-24 11:49:04 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/robots.txt> (failed 1 times): DNS lookup failed: no results for hostname lookup: quotes.toscrape.com.

因此，我尝试设置我的代理服务器以获得连接，我也必须使用pip install（仅作为示例）。我是通过使用来自Scrapy and proxies的Amom方法更改教程的代码来实现的：

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            request = scrapy.Request(url=url, callback=self.parse)
            request.meta['proxy'] = "user@proxy:port"
            yield request

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
        f.write(response.body)
        self.log('Saved file %s' % filename)

有人如何解决这个问题？我真的需要让它工作。提前致谢。

Answer 1

这意味着他们正在阻止scrapy，即他们不允许任何人抓取他们的网站。对不起，你对此无能为力。

Scrapy和python：DNS查找失败：主机名查找没有结果 - 代理问题？

1 个答案: