如何在302重定向301之后获取第一个请求URL

时间:2016-12-03 19:01:06

标签: python python-2.7 redirect scrapy

我在网上使用scrapy(版本:1.1.1)进行scrapy。这是我面对的:

class Link_Spider(scrapy.Spider):
    name = 'GetLink'
    allowed_domains = ['example_0.com']
    with codecs.open('link.txt', 'r', 'utf-8') as f:
        start_urls = [url.strip() for url in f.readlines()]

def parse(self, response):
    print response.url

在上面的代码中,' start_urls' type是一个列表:

start_urls = [
              example_0.com/?id=0,
              example_0.com/?id=1,
              example_0.com/?id=2,
             ] # and so on

当scrapy运行时,调试信息告诉我:

[scrapy] DEBUG: Redirecting (302) to (GET https://example_1.com/?subid=poison_apple) from (GET http://example_0.com/?id=0)
[scrapy] DEBUG: Redirecting (301) to (GET https://example_1/ture_a.html) from (GET https://example_1.com/?subid=poison_apple)
[scrapy] DEBUG: Crawled (200) (GET https://example_1/ture_a.html) (referer: None)

现在,我怎么知道' http://example_0.com/?id= ***'在' start_url'与' https://example_1/ture_a.html'的网址配对?有人可以帮帮我吗?

1 个答案:

答案 0 :(得分:0)

每个回复都附有一个请求,因此您可以从中检索原始网址:

def parse(self, response):
    print('original url:')
    print(response.request.url)