Question

我在github上发现了一个有趣的刮板。 https://github.com/apetz/email-scraper

来自网站的蜘蛛废料电子邮件。

此刮板需要通过命令行以网站作为参数来调用：

scrapy crawl spider -a domain="your.domain.name" -o emails-found.csv

我想编辑此剪贴簿以便将电子邮件而不是json文件存储在数据库中。

因此，我尝试在/spiders/thorough_spider.py中的“ ThoroughSpider”类中获取“域”参数。

因此，在我的pipelines.py文件中，我写道：

 import spiders.thorough_spider

为了导入包含可变的ThoroughSpider.domain的模块fully_spider

但是pycharm告诉我

“没有名为蜘蛛的模块”

。

所以我尝试了这一行：

 from spiders import thorough_spider

pycharm这次告诉我

“未解决的参考”蜘蛛”。

以下是位于蜘蛛“蜘蛛”中的蜘蛛网域网充分的代码：

class ThoroughSpider(scrapy.Spider):
    name = "spider"

    def __init__(self, domain=None, subdomain_exclusions=[], crawl_js=False):
        self.allowed_domains = [domain]
        start_url = "http://" + domain

        self.start_urls = [
            start_url
        ]

这是我pipelines.py中位于“蜘蛛”文件夹上方的代码：

from scrapy.exceptions import DropItem
import mysql.connector

import spiders.thorough_spider
from spiders import thorough_spider

您知道如何在我的pipeline.py中让域作为参数传递吗？

Answer 1

如果要从当前目录模块导入，可以使用点.

因此您可以尝试：

from .spiders.thorough_spider import ThoroughSpider

它应该正常工作

Answer 2

尝试

from scraper.spiders import thorough_spider

或将“ scraper”替换为您的项目名称

如何从我的pipelines.py文件导入蜘蛛类的变量？

2 个答案: