Scrapy:无法在DownloadMiddleware的process_request中更改请求对象的url

时间:2013-10-05 13:47:38

标签: python scrapy

我正在尝试构建一个可以更改请求对象url的scrapy下载软件。但我无法使用process_request,因为下载页面仍然是原始URL。我的代码如下:

#middlewares.py
class UrlModifyMiddleware(object):
    def process_request(self, request, spider):
        original_url = request.url
        m_url = 'http://whatsmyuseragent.com/'
        request.url = m_url
        #request = request.replace(url=relay_url)

蜘蛛的代码:

#spider/test_spider.py
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import Request

class TestSpider(CrawlSpider):
    name = "urltest"
    start_url = "http://www.icanhazip.com/"

    def start_requests(self):
        yield Request(self.start_url,callback=self.parse_start) 

    def parse_start(self,response):
        html_page = response.body
        open('test.html', 'wb').write(html_page)

在settings.py中,我设置了:

DOWNLOADER_MIDDLEWARES = {
    'official_index.middlewares.UrlModifyMiddleware': 100,
}

1 个答案:

答案 0 :(得分:0)

我用scrapy 0.18.0测试我的确切代码。代码工作正常。我想也许这是版本0.14.4的错误。