Scrapy:在scrapyd运行多个蜘蛛 - python逻辑错误

时间:2018-01-25 12:44:59

标签: python scrapy scrapyd

Scrapy 1.4

我正在使用此脚本(Run multiple scrapy spiders at once using scrapyd)在Scrapyd安排多个蜘蛛。在我使用Scrapy 0.19之前,运行正常。

我收到错误:class AllCrawlCommand(ScrapyCommand): requires_project = True default_settings = {'LOG_ENABLED': False} def short_desc(self): return "Schedule a run for all available spiders" def run(self, args, opts): cursor = get_db_connection() cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'") rows = cursor.fetchall() # Coloco todos os dominios dos sites em uma lista # La embaixo faco uma verificacao para rodar somente os # que estao disponiveis e somente os que batem o dominio do site sites = [] for row in rows: site = row[2] print site # adiciono cada site na lista sites.append(site) url = 'http://localhost:6800/schedule.json' crawler = self.crawler_process.create_crawler() crawler.spiders.list() for s in crawler.spiders.list(): #print s if s in sites: values = {'project' : 'esportifique', 'spider' : s} r = requests.post(url, data=values) print(r.text)

所以现在我不知道问题是在Scrapy版本还是一个简单的python逻辑问题(我是python的新手)

如果蜘蛛在数据库中处于活动状态,我做了一些修改以检查。

{{1}}

1 个答案:

答案 0 :(得分:1)

基于parik建议的链接,这是我做的:

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
import requests

setting = get_project_settings()
process = CrawlerProcess(setting)

url = 'http://localhost:6800/schedule.json'

cursor = get_db_connection()
cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'")
rows = cursor.fetchall()

# Coloco todos os dominios dos sites em uma lista
# La embaixo faco uma verificacao para rodar somente os
# que estao disponiveis e somente os que batem o dominio do site
sites = []
for row in rows:
    site = row[2]
    print site

    # adiciono cada site na lista 
    sites.append(site)

for spider_name in process.spiders.list():
    print ("Running spider %s" % (spider_name))
    #process.crawl(spider_name,query="dvh") #query dvh is custom argument used in your scrapy
    if spider_name in sites:
        values = {'project' : 'esportifique', 'spider' : spider_name}
        r = requests.post(url, data=values)