我正在尝试使用python 3.6在Ubuntu 16上使用scrapy。当我运行“scrapy crawl mySpiderName”时,它显示错误:
2018-06-17 11:18:50 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: ToutiaoAppSpider)
2018-06-17 11:18:50 [scrapy.utils.log] INFO: Versions: lxml 3.5.0.0, libxml2 2.9.3, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 18.4.0, Python 3.5.2 (default, Nov 23 2017, 16:37:01) - [GCC 5.4.0 20160609], pyOpenSSL 18.0.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.2.2, Platform Linux-4.13.0-43-generic-x86_64-with-Ubuntu-16.04-xenial
2018-06-17 11:18:50 [scrapy.crawler] INFO: Overridden settings: {'SPIDER_MODULES': ['ToutiaoAppSpider.spiders'], 'CONCURRENT_REQUESTS_PER_IP': 10, 'DOWNLOAD_TIMEOUT': 15, 'CONCURRENT_REQUESTS': 10, 'NEWSPIDER_MODULE': 'ToutiaoAppSpider.spiders', 'BOT_NAME': 'ToutiaoAppSpider', 'CONCURRENT_REQUESTS_PER_DOMAIN': 10}
2018-06-17 11:18:50 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.memusage.MemoryUsage']
2018-06-17 11:18:50 [ToutiaoHao] INFO: Reading start URLs from redis key 'ToutiaoHao:start_urls' (batch size: 10, encoding: utf-8
2018-06-17 11:18:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-06-17 11:18:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
Unhandled error in Deferred:
2018-06-17 11:18:50 [twisted] CRITICAL: Unhandled error in Deferred:
2018-06-17 11:18:50 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 80, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 105, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.5/dist-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.5/dist-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.5/dist-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.5/dist-packages/scrapy/middleware.py", line 36, in from_settings
mw = mwcls.from_crawler(crawler)
File "/home/tuijian/crawler/ToutiaoAppSpider/ToutiaoAppSpider/pipelines.py", line 34, in from_crawler
redis_host=crawler.settings['REDIS_HOST'])
File "/home/tuijian/crawler/ToutiaoAppSpider/ToutiaoAppSpider/pipelines.py", line 17, in __init__
self.db = connect[mongodb_db]
NameError: name 'connect' is not defined
我在堆栈溢出时搜索了这种情况,但大多数问题都不适合我的问题。
Settings.py内容如下所示:
# -*- coding: utf-8 -*-
# Scrapy settings for ToutiaoAppSpider project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'ToutiaoAppSpider'
SPIDER_MODULES = ['ToutiaoAppSpider.spiders']
NEWSPIDER_MODULE = 'ToutiaoAppSpider.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'ToutiaoAppSpider (+http://www.yourdomain.com)'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 10
# Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 10
CONCURRENT_REQUESTS_PER_IP = 10
DOWNLOAD_TIMEOUT = 15
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
# Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'ToutiaoAppSpider.middlewares.ToutiaoappspiderSpiderMiddleware': 543,
#}
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None
#'ToutiaoAppSpider.middlewares.ProxyMiddleware': 543,
#'ToutiaoAppSpider.middlewares.Push2RedisMiddleware': 544,
#'ToutiaoAppSpider.middlewares.SkipYg365Middleware': 545,
}
# Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'ToutiaoAppSpider.pipelines.ToutiaoappspiderPipeline': 300,
}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
HEADERS = {
'Host': 'is.snssdk.com',
'Connection': 'keep-alive',
'Accept-Encoding': 'gzip',
'request_timestamp_client': '332763604',
'X-SS-REQ-TICKET': '',
'User-Agent': 'Dalvik/2.1.0 (Linux; U; Android 7.1.2; Redmi 4X MIUI/8.2.1) NewsArticle/6.5.8 cronet/58.0.2991.0'
}
DETAIL_HEADERS = {
'Accept-Encoding': 'gzip',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (Linux; Android 7.1.2; Redmi 4X Build/N2G47H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/62.0.3202.84 Mobile Safari/537.36 JsSdk/2 NewsArticle/6.5.8 NetType/wifi',
'Connection': 'Keep-Alive'
}
COMPLEX_DETAIL_HEADERS = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8',
'cache-control': 'max-age=0',
'dnt': '1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36'
}
TOUTIAOHAO_HEADERS = {
'accept': 'application/json, text/javascript',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8',
'content-type': 'application/x-www-form-urlencoded',
'dnt': '1',
'referer': 'https://www.toutiao.com/c/user/{user_id}/',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'}
FEED_URL_old = 'https://is.snssdk.com/api/news/feed/v77/?list_count={list_count}&category={category}&refer=1&count=20' \
'{max_behot_time}&last_refresh_sub_entrance_interval=1519286082&loc_mode=7&tt_from=pre_load_more' \
'&plugin_enable=3&_d_s=1&iid=26133660951&device_id=37559129771&ac=wifi&channel=tianzhuo_toutiao_sg' \
'&aid=13&app_name=news_article&version_code=658&version_name=6.5.8&device_platform=android' \
'&ab_version=281719%2C278039%2C249665%2C249684%2C249686%2C249642%2C249670%2C249673%2C281732%2C229304%2C249671' \
'%2C282686%2C282218%2C275584%2C277466%2C281426%2C280418%2C232362%2C265707%2C279809%2C239097%2C170988%2C281158' \
'%2C269426%2C273499%2C279386%2C281391%2C281612%2C276203%2C281098%2C257281%2C281472%2C280149%2C277718%2C278670' \
'%2C271717%2C259492%2C280773%2C282147%2C272683%2C251795%2C276758%2C282776%2C251713%2C280097%2C282669%2C31210' \
'%2C279129%2C270335%2C280969%2C227649%2C280220%2C264034%2C258356%2C247850%2C280448%2C282714%2C281293%2C278160' \
'%2C249045%2C244746%2C264615%2C260657%2C241181%2C282157%2C271178%2C252767%2C249828%2C246859' \
'&ab_client=a1%2Cc4%2Ce1%2Cf2%2Cg2%2Cf7&ab_group=100167&ab_feature=94563%2C102749&abflag=3&ssmix=a' \
'&device_type=Redmi+4X&device_brand=Xiaomi&language=zh&os_api=25&os_version=7.1.2&uuid=864698037116551' \
'&openudid=349f495868d2f06d&manifest_version_code=658&resolution=720*1280&dpi=320&update_version_code=65809' \
'&_rticket={_rticket}&plugin=10575&fp=TlTqLYFuLlXrFlwSPrU1FYmeFSwt&rom_version=miui_v9_8.2.1' \
'&ts={ts}&as={_as}&mas={mas}&cp={cp}'
FEED_URL = 'http://ib.snssdk.com/api/news/feed/v78/?fp=RrTqLWwuL2GSFlHSFrU1FYFeL2Fu&version_code=6.6.0' \
'&app_name=news_article&vid=49184A00-1B68-4FE9-9FD5-B24BED70032C&device_id=48155472608&channel=App%20Store' \
'&resolution=1242*2208&aid=13&ab_version=264071,275782,275652,271178,252769,249828,246859,285542,278039,' \
'249664,249685,249687,249675,249668,249669,249673,281730,229305,249672,285965,285209,283849,277467,283757,' \
'286568,285618,284439,280773,286862,286446,285535,251712,283794,285944,31210,285223,283836,286558,258356,' \
'247849,280449,284752,281296,249045,275612,278760,283491,264613,260652,286224,286477,261338,241181,283777,' \
'285703,285404,283370,286426,239096,286955,170988,273497,285788,279386,281389,276203,286300,286056,286878,' \
'257282,281472,277769,280147,284908&ab_feature=201616,z1&ab_group=z1,201616' \
'&openudid=a8b364d577ac6c59e96dbcf3cc57d9c4eaa9420c&idfv=49184A00-1B68-4FE9-9FD5-B24BED70032C&ac=WIFI' \
'&os_version=11.2.5&ssmix=a&device_platform=iphone&iid=25922108613&ab_client=a1,f2,f7,e1' \
'&device_type=iPhone%208%20Plus&idfa=2F0B6A5F-B939-4EB5-9D51-D81698D8729D&refresh_reason=4&category={category}' \
'&last_refresh_sub_entrance_interval=1519794464&detail=1&tt_from=unknown&count=20&list_count={list_count}' \
'&LBS_status=authroize&loc_mode=1&cp={cp}&{max_behot_time}&session_refresh_idx=1&image=1&strict=0&refer=1' \
'&city=%E5%8C%97%E4%BA%AC&concern_id=6215497896830175745&language=zh-Hans-CN&st_time=67&as={_as}&ts={ts}'
DETAIL_URL = 'http://is.snssdk.com/2/article/information/v23/?fp=RrTqLWwuL2GSFlHSFrU1FYFeL2Fu&version_code=6.6.0' \
'&app_name=news_article&vid=49184A00-1B68-4FE9-9FD5-B24BED70032C&device_id=48155472608' \
'&channel=App%20Store&resolution=1242*2208&aid=13&ab_version=264071,275782,275652,271178,252769,249828,' \
'246859,283790,278039,249664,249685,249687,249675,249668,249669,249673,281730,229305,249672,283849,' \
'277467,283757,284439,280773,283091,283787,282775,251712,283794,280097,284801,31210,283489,283836,280966,' \
'280220,258356,247849,280449,284752,281296,281724,249045,281414,275612,278760,283491,264613,282974,260652,' \
'261338,241181,283777,284201,282897,283370,239096,170988,269426,273497,279386,281389,281615,276203,279014,' \
'257282,281472,277769,280147&ab_feature=201616,z1&ab_group=z1,201616' \
'&openudid=a8b364d577ac6c59e96dbcf3cc57d9c4eaa9420c&idfv=49184A00-1B68-4FE9-9FD5-B24BED70032C' \
'&ac=WIFI&os_version=11.2.5&ssmix=a&device_platform=iphone&iid=25922108613&ab_client=a1,f2,f7,e1' \
'&device_type=iPhone%208%20Plus&idfa=2F0B6A5F-B939-4EB5-9D51-D81698D8729D&article_page=0' \
'&group_id={group_id}&device_id=48155472608&aggr_type=1&item_id={item_id}&from_category=__all__' \
'&as={_as}&ts={ts}'
COMMENTS_URL = 'https://is.snssdk.com/article/v2/tab_comments/?group_id={group_id}&item_id={item_id}&aggr_type=1' \
'&count=20&offset={offset}&tab_index=0&fold=1&iid=26133660951&device_id=37559129771&ac=wifi' \
'&channel=tianzhuo_toutiao_sg&aid=13&app_name=news_article&version_code=658&version_name=6.5.8' \
'&device_platform=android&ab_version=281719%2C278039%2C249665%2C249684%2C249686%2C283244%2C249642' \
'%2C249670%2C249673%2C281732%2C229304%2C249671%2C282686%2C282218%2C275584%2C277466%2C281426' \
'%2C280418%2C282898%2C232362%2C265707%2C279809%2C239097%2C170988%2C281158%2C269426%2C273499%2C279386' \
'%2C281391%2C281612%2C276203%2C281098%2C257281%2C281472%2C280149%2C277718%2C283104%2C271717%2C259492' \
'%2C283184%2C280773%2C282147%2C272683%2C251795%2C283177%2C282776%2C251713%2C280097%2C282669%2C31210' \
'%2C283097%2C283138%2C270335%2C280969%2C227649%2C280220%2C264034%2C258356%2C247850%2C280448%2C283165' \
'%2C281293%2C278160%2C249045%2C244746%2C264615%2C282973%2C260657%2C241181%2C282157%2C271178%2C252767' \
'%2C249828%2C246859&ab_client=a1%2Cc4%2Ce1%2Cf2%2Cg2%2Cf7&ab_group=100167&ab_feature=94563%2C102749' \
'&abflag=3&ssmix=a&device_type=Redmi+4X&device_brand=Xiaomi&language=zh&os_api=25&os_version=7.1.2' \
'&uuid=864698037116551&openudid=349f495868d2f06d&manifest_version_code=658&resolution=720*1280' \
'&dpi=320&update_version_code=65809&_rticket={_rticket}&plugin=10575&fp=TlTqLYFuLlXrFlwSPrU1FYmeFSwt' \
'&rom_version=miui_v9_8.2.1&ts={ts}&as={_as}&mas={mas}'
ACCOUNT_HEADERS = {
'accept': 'application/json, text/javascript',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8',
'content-type': 'application/x-www-form-urlencoded',
# 'cookie:WEATHER_CITY=%E5%8C%97%E4%BA%AC; UM_distinctid=160b9d56b07aa1-0322b0f040feaa-32607e02-13c680-160b9d56b08f0f; uuid="w:38808a712dd144679f3b524be9378a9e"; __utmc=24953151; __utmz=24953151.1515146646.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=24953151.940475929.1515146646.1515146646.1515149010.2; _ga=GA1.2.940475929.1515146646; utm_source=toutiao; tt_webid=74484831828; CNZZDATA1259612802=2101983015-1514944629-https%253A%252F%252Fwww.google.com.ph%252F%7C1519894157; __tasessionId=ni5yklwbb1519898143309; tt_webid=6527912825075254788
'dnt': '1',
# 'referer': 'https://www.toutiao.com/search/?keyword={}',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'
}
MONGODB_URI = 'mongodb://wifi:zenmen@10.19.83.217'
MONGODB_DB = 'toutiao_app'
MONGODB_COLLECTION_NEWS = 'news'
MONGODB_COLLECTION_DETAIL = 'detail'
MONGODB_COLLECTION_COMMENTS = 'comments'
REDIS_HOST = '54.169.202.250'
请有人帮我解决这个问题吗?
谢谢!