Question

我已经四处寻找答案了，找不到答案。正如我昨天提到的，我是scrapy和python的新手，所以答案可能就在那里，但我没有抓住。

我写了我的蜘蛛工作得很好。这是我的管道......

import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class somepipeline(object):
    def __init__(self):
        self.conn = MySQLdb.connect(user='user', 'passwd', 'dbname', 'host', charset="utf8", use_unicode=True)
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):    
        try:
            self.cursor.execute("""INSERT INTO sometable (title, link, desc)  
                            VALUES (%s, %s)""", 
                           (item['title'].encode('utf-8'), 
                            item['link'].encode('utf-8'),
                            item['desc'].encode('utf-8'))

            self.conn.commit()
        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])
        return item

继承我的设置：

BOT_NAME = 'somebot'

SPIDER_MODULES = ['somespider.spiders']
NEWSPIDER_MODULE = 'somespider.spiders'
ITEM_PIPELINES = ['myproject.pipeline.somepipeline']

然而，当我运行这个时，我得到一个：没有名为管道错误的模块

找到一个类似的答案，但它是一个图像类，我只想要HTML数据。

我做错了什么？我是否需要下载其他模块或其他内容？感谢帮助。如果我很近，就给我一个肘部。

Answer 1

Scrapy教程有一个错字：它必须是'pipelineS'

ITEM_PIPELINES = ['myproject.pipelines.somepipeline']

Answer 2

没有＆＃34;管道＆＃34;文件。它应该是＆＃34;管道＆＃34;。所以你需要改变

ITEM_PIPELINES = ['myproject.pipeline.somepipeline']

到

ITEM_PIPELINES = ['myproject.pipelines.somepipeline']

Answer 3

正确的目录路径应该是这样的：

myproject/
     scrapy.cfg  
     myproject/
         __init__.py
         items.py
         pipeline.py
         settings.py
         spiders/
            spider.py

另外，您能确认您的蜘蛛是否正常工作？例如，如果您要注释掉ITEM_PIPELINES设置，您的蜘蛛是否正常工作并产生预期的输出？

Scrapy管道到MySQL - 找不到答案

3 个答案: