Question

我是新手，我在使用cookies的网络中使用scrapy，这对我来说是一个问题，因为我可以在没有cookie的情况下获取网络数据但是获取带有cookie的网络数据是很难的我。我有这个代码结构

class mySpider(BaseSpider):
    name='data'
    allowed_domains =[]
    start_urls =["http://...."]

def parse(self, response):
    sel = HtmlXPathSelector(response)
    items = sel.xpath('//*[@id=..............')

    vlrs =[]

    for item in items:
        myItem['img'] = item.xpath('....').extract()
        yield myItem

这很好，我可以使用这种代码结构获得没有cookie的数据我找到它，因为我可以使用cookie，在这个网址，但我不明白我应该把这个代码放在哪里，然后能够使用xpath获取数据

我正在测试此代码

request_with_cookies = Request(url="http://...",cookies={'country': 'UY'})

但我不知道我可以工作或放置此代码的地方，我将此代码放入函数解析中，以获取数据

def parse(self, response):
    request_with_cookies = Request(url="http://.....",cookies={'country':'UY'})

    sel = HtmlXPathSelector(request_with_cookies)
    print request_with_cookies

我尝试使用带有cookie的这个新网址的XPath，以便稍后打印这个新的数据抓取我认为这就像使用没有cookie的网址一样但是当我运行它时我有一个错误，因为'Request'对象没有属性'body_as_unicode' 使用这些cookie的正确方法是什么，我有点失落非常感谢你。

Answer 1

你非常接近！ parse（）方法的契约是yield s（或返回可迭代的）Item s，Request s或两者的混合。在你的情况下，你应该做的就是

yield request_with_cookies

并且您的parse（）方法将再次运行，其中Response对象是通过请求带有这些Cookie的URL生成的。

http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=parse#scrapy.spider.Spider.parse http://doc.scrapy.org/en/latest/topics/request-response.html

在scrapy中使用cookies的正确工作形式是什么

1 个答案: