无法将scrapy连接到我的数据库

时间:2017-09-14 11:38:36

标签: python mongodb scrapy

我必须执行爬虫并将数据放入数据库中。 我已经收集了我的数据,但是我把它们放在数据库中会有问题。

我的档案是:

topcrawlerspider.py(我的抓取工具,他是功能性的):

from scrapy import Spider, Item, Field, Request
from ..items import TopcrawlerItem
from ..pipelines import TopcrawlerPipeline
import time

class TopSpider(Spider):

name = 'topcrawler'
start_urls = ['...']

def __init__(self, page=0, *args, **kwargs):
    super(TopSpider, self).__init__(*args, **kwargs)
    self.search_result_url_tpl = 'http://.../%s'
...

settings.py:

BOT_NAME = 'topcrawler'

SPIDER_MODULES = ['topcrawler.spiders']
NEWSPIDER_MODULE = 'topcrawler.spiders'


# Crawl responsibly by identifying yourself (and your website) on the 
user-agent
#USER_AGENT = 'topcrawler (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

ITEM_PIPELINES = {
 'topcrawler.pipelines.TopcrawlerPipeline': 300,
 # 'topcrawler.pipelines.JsonWriterPipeline': 800,
}

MONGODB_URI = 'mongodb://root:root@127.0.0.1:8889/mtdbdd'
MONGO_DATABASE = 'mtdbdd'

pipelines.py:

import pymongo
from settings import *

class TopcrawlerPipeline(object):

 collection_name = 'land'

def __init__(self, mongo_uri, mongo_db):
    self.mongo_uri = mongo_uri
    self.mongo_db = mongo_db

@classmethod
def from_crawler(cls, crawler):
    return cls(
        mongo_uri=crawler.settings.get('MONGO_URI'),
        mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
    )

def open_spider(self, spider):
    self.client = pymongo.MongoClient(self.mongo_uri)
    self.db = self.client[self.mongo_db]

def close_spider(self, spider):
    self.client.close()

def process_item(self, item, spider):
    self.db[self.collection_name].insert(dict(item))
    return item

我有错误:

ServerSelectionTimeoutError: localhost:27017: [Errno 8] nodename nor servname provided, or not known

它似乎并没有像我想要的那样连接到8889端口,但我不知道为什么......

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

TopcrawlerPipeline类和方法open_spiderpipelines.py文件中)中,您创建了重复的client

self.client = pymongo.MongoClient(connect=False)
self.client = 
    pymongo.MongoClient('mongodb://root:root@127.0.0.1:8889/mtdbdd')

我打赌错误来自第一个错误(我认为这是无意的)。删除第一个,只留下第二个。

只是一个旁注,说明错误可能来自哪里。如果未在MongoClient中指定连接字符串,则会尝试连接到localhost和默认端口27017.检查/etc/hosts文件以了解localhost的定义方式(我假设您使用的是Linux)。在某些系统上,仅为localhost分配IPv6地址,默认情况下MongoDB不侦听IPv6地址。