只是从Scrapy开始,我试图写到MySQL数据库而不是输出到csv。
我在这里找到了代码:https://gist.github.com/tzermias/6982723,我正在尝试使它起作用,但不幸的是我遇到了一个错误,无法理解。
这是我的pipelines.py:
class WebsitePipeline(object):
def process_item(self, item, spider):
return item
import MySQLdb.cursors
from twisted.enterprise import adbapi
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.utils.project import get_project_settings
from scrapy import log
SETTINGS = get_project_settings()
class MySQLPipeline(object):
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.stats)
def __init__(self, stats):
#Instantiate DB
self.dbpool = adbapi.ConnectionPool ('MySQLdb',
host=SETTINGS['DB_HOST'],
user=SETTINGS['DB_USER'],
passwd=SETTINGS['DB_PASSWD'],
port=SETTINGS['DB_PORT'],
db=SETTINGS['DB_DB'],
charset='utf8',
use_unicode = True,
cursorclass=MySQLdb.cursors.DictCursor
)
self.stats = stats
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
""" Cleanup function, called after crawing has finished to close open
objects.
Close ConnectionPool. """
self.dbpool.close()
def process_item(self, item, spider):
query = self.dbpool.runInteraction(self._insert_record, item)
query.addErrback(self._handle_error)
return item
def _insert_record(self, tx, item):
result = tx.execute(
""" INSERT INTO table VALUES (1,2,3)"""
)
if result > 0:
self.stats.inc_value('database/items_added')
def _handle_error(self, e):
log.err(e)
这是我的settings.py:
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'Website.pipelines.MySQLPipeline': 300,
}
#Database settings
DB_HOST = 'localhost'
DB_PORT = 3306
DB_USER = 'username'
DB_PASSWD = 'password'
DB_DB = 'scrape'
这是spider.py:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import SitemapSpider
class WebsitesitemapSpider(SitemapSpider):
name = 'Websitesitemap'
allowed_domains = ['Website.com']
sitemap_urls = ['https://www.Website.com/robots.txt']
def parse(self, response):
yield {response.url}
我一直无法找到我正在做的事情的实际例子,以便能够找出我要去的地方,所以感谢所有关注此事或可能提供帮助的人。
答案 0 :(得分:0)
您是否已将这些软件包安装为“ MySQLdb,scrapy,twisted”。
否则,请尝试使用PIP进行安装,然后尝试运行脚本。
答案 1 :(得分:0)
您将需要在python环境中安装MySQL-python,以及在操作系统上安装libmysql。
在Ubuntu上,可以通过以下方式实现。
pip install MySQL-python
sudo apt-get install libmysql-dev