Question

我是Scrapy的新手，但在最后几天花了很多精力。但是，我在基本原则上仍然失败。

我正在尝试抓取以下网站：https://blogabet.com/tipsters 我的目标是下载所有链接到用户配置文件。例如，https://sabobic.blogabet.com/

当我使用scrapy shell时，我可以提取特定的xpath。但是，当我尝试使用脚本并以“ scrapy crawl ....”启动它时。我总是没有结果。 INFO：抓取0页（以0页/分钟），抓取0项（以0件/分钟）

我的代码有什么问题？

import scrapy
from scrapy import Request

class BlogmeSpider(scrapy.Spider):
    name = 'blogme'


    def start_request(self):

        url = "https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=0"

        headers={
            'Accept': '*/*',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9,pl;q=0.8,de;q=0.7',
            'Connection': 'keep-alive',
            'Host': 'blogabet.com',
            'Referer': 'https://blogabet.com/tipsters',
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
            'X-Requested-With': 'XMLHttpRequest'
        }

        yield scrapy.http.Request(url, headers=headers)



    def parse(self, response):
        username = response.xpath('//*[@class="e-mail u-db u-mb1 text-ellipsis"]/a/@href').extract()
        yield {'username': username}

Scrapy的新功能-INFO：抓取0页（以0页/分钟），抓取0件（以0件/分钟）

0 个答案: