如何将参数传递给抓痒的蜘蛛并从python内部初始化

时间:2019-05-19 13:06:30

标签: python python-3.x scrapy

我正在尝试将变量screen_name传递给我的蜘蛛,因为此screen_name每次都会更改。 (最终目标是让多个蜘蛛以不同的屏幕名称运行)

我这样初始化

process.crawl(TwitterSpider(screen_name="realDonaldTrump"))

但是我遇到以下错误。

  

spider = cls(* args,** kwargs)TypeError: init ()缺少1个必需项   位置参数:“ screen_name”

import scrapy
from scrapy.crawler import CrawlerProcess

class TwitterSpider(scrapy.Spider):
    name = "twitter_friends"


    def __init__(self, screen_name, *args, **kwargs):
        self.usernames = []
        self.screen_name = screen_name
        super().__init__(**kwargs)  


    def start_requests(self):

        base_url = "https://mobile.twitter.com"
        urls = [
            base_url + '/{screen_name}/following'.format(screen_name=self.screen_name,
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def closed(self, spider):
        print("spider closed")

    def parse(self, response):
        pass


process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})


process.crawl(TwitterSpider(screen_name="realDonaldTrump"))
process.start() # the script will block here until the crawling is finished

这不是关于如何从cmd行运行它的问题,而只是从python内部运行

1 个答案:

答案 0 :(得分:1)

您可以将Spider类及其参数传递给crawl方法。例如:

process.crawl(TwitterSpider, screen_name="realDonaldTrump")
process.start() 
相关问题