Scrapy:ImportError:没有名为project_name.settings的模块

时间:2015-06-28 19:04:46

标签: python web-crawler scrapy

我试图创建一个运行许多蜘蛛的脚本,但我得到了ImportError: No module named project_name.settings

我的脚本如下所示:

import os
os.system("scrapy crawl spider1")
os.system("scrapy crawl spider2")
....
os.system("scrapy crawl spiderN")

我的settings.py

# -*- coding: utf-8 -*-

# Scrapy settings for project_name
#
# For simplicity, this file contains only the most important settings by
# default. All the other settings are documented here:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#

BOT_NAME = 'project_name'

ITEM_PIPELINES = {
    'project_name.pipelines.project_namePipelineToJSON': 300,
    'project_name.pipelines.project_namePipelineToDB': 800
}

SPIDER_MODULES = ['project_name.spiders']
NEWSPIDER_MODULE = 'project_name.spiders'

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'project_name (+http://www.yourdomain.com)'

我的蜘蛛看起来像普通蜘蛛,其实非常简单......

import scrapy
from scrapy.crawler import CrawlerProcess
from Projectname.items import ProjectnameItem

class ProjectnameSpiderClass(scrapy.Spider):
    name = "Projectname"
    allowed_domains = ["Projectname.com"]

    start_urls = ["...urls..."]


    def parse(self, response):
        item = ProjectnameItem()

我给了他们通用名称,但你明白了,有没有办法解决这个错误?

1 个答案:

答案 0 :(得分:1)

编辑2018:

您需要从项目文件夹中运行spider,这意味着必须从包含os.system("scrapy crawl spider1")的文件夹运行spider1

或者你可以像我过去那样做,将所有代码放在一个文件中(旧答案,不再由我推荐,但仍然是有用且体面的解决方案)

  

好吧,如果有人提出这个问题,我终于在另一个问题中使用了alexce提供的这个https://gist.github.com/alecxe/fc1527d6d9492b59c610的大量修改版本。希望这会有所帮助。