Question

我希望你做得很好

我正在尝试使用scrapy框架来抓取this website。我只需要在csv文件中输出结果标题。乍一看这个任务似乎很简单，但是当使用scrapy shell时，我发现响应对象是空的。这是一个截图：

screenshot of the result using the scrapy shell

我对自己说这个网站使用了AJAX请求。我查看了开发人员选项卡，发现确实，该网站使用了AJAX请求。我对它进行了反向工程，并找到了AJAX调用的请求URL：https://www.tineye.com/search/get_domains/9fdeed61d697e871c74e38116d9c41276bce052e?

这是一张显示我所做的图像

screenshot of the developers tool tab

然后我相应地修改了我的代码，以请求AJAX调用的url，而不是页面本身。

import scrapy
import json

#cpt = 0

class SearchSpider(scrapy.Spider):
    name = "search"
    allowed_domains = ["www.tineye.com"]
    start_urls = ["https://www.tineye.com/search/get_domains/9fdeed61d697e871c74e38116d9c41276bce052e?"]

def parse(self, response):
    #global cpt
    data = json.loads(response.text)
    data = data["domains"]
    for item in data:
        #cpt = cpt + int(item[1])
        yield {
            "link": item[0],
            "times": item[1],
        }

但是再一次，响应对象是空的。我该怎么办？提前致谢

Scrapy：清空响应对象

0 个答案: