Scrapy和MySQLdb

时间:2013-08-05 20:34:04

标签: python python-2.7 web-scraping scrapy mysql-python

我在Python 2.7上使用Mac OS X Lion 10.7.5

我最初使用MySQLdb安装pip-2.7 install MySQL-python以及下载然后运行python2.7 setup.py buildpython2.7 setup.py install时出现问题。我使用MySQL的32位和64位安装以及相应的架构尝试了这些不同的方法,但无济于事。

我的解决方案是安装Macports。然后,我使用Macports安装了MySQLMySQL-pythonMySQLdb)。

我使用Wing IDE来开发代码,因此我切换到了Python的Macports版本 - 导入MySQLdb正常工作。我还将Python的默认终端版本切换到此Macports版本,并通过从命令行调用python来验证它是默认版本 - 正确的版本已启动。

所以现在问题是:我正在使用scrapy来搜索电影网页以获取信息。我的管道将已删除的数据定向到数据库,该数据库使用前面提到的MySQLdb模块。当我进入命令行cd进入我的项目并运行scrapy crawl MySpider时,出现以下错误:

 raise ImportError, "Error loading object '%s': %s" % (path, e)
 ImportError: Error loading object 'BoxOfficeMojo.pipelines.BoxofficemojoPipeline': No module named MySQLdb.cursors

我已经检查并确保我可以从python2.7 shell导入MySQLdb.cursors,所以我认为Python scrapy的哪个版本正在使用...

::::: UPDATE :::::

以下是完整的追溯:

 Traceback (most recent call last):
   File "/usr/local/bin/scrapy", line 4, in <module>
execute()
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 131, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 76, in _run_print_help
func(*a, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 138, in _run_command
cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/crawl.py", line 43, in run
spider = self.crawler.spiders.create(spname, **opts.spargs)
  File "/Library/Python/2.7/site-packages/scrapy/command.py", line 33, in crawler
self._crawler.configure()
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 41, in configure
self.engine = ExecutionEngine(self, self._spider_closed)
  File "/Library/Python/2.7/site-packages/scrapy/core/engine.py", line 63, in __init__
self.scraper = Scraper(crawler)
  File "/Library/Python/2.7/site-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
  File "/Library/Python/2.7/site-packages/scrapy/middleware.py", line 50, in from_crawler
return cls.from_settings(crawler.settings, crawler)
  File "/Library/Python/2.7/site-packages/scrapy/middleware.py", line 29, in from_settings
mwcls = load_object(clspath)
  File "/Library/Python/2.7/site-packages/scrapy/utils/misc.py", line 39, in load_object
    raise ImportError, "Error loading object '%s': %s" % (path, e)
    ImportError: Error loading object 'BoxOfficeMojo.pipelines.BoxofficemojoPipeline': No module named MySQLdb.cursors

:::::更新2 :::::

这是我目前的路径:

 $PATH
  -bash:       /opt/local/bin:/opt/local/sbin:/usr/local/bin:/usr/local/sbin:~/bin:/Library/Frameworks/Python  .framework/Versions/3.3/bin:/Library/Frameworks/Python.framework/Versions/3.3/bin:/Library/Frameworks/Python.framework/Versions/3.3/bin:/Library/Frameworks/Python.framework/Versions/3.3/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin: No such file or directory

:: ALSO ::

我将此添加到代码中以希望修复问题 - 它是py27-mysqlMySQLdb)的位置,但返回相同的错误:

 import sys; sys.path.append("/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages")

:: ALSO#2 ::

以下是我的管道代码 - 我不知道它是否有效,因为我一直收到有关import的错误,但认为它可能会有所帮助:

 from scrapy import log
 from twisted.enterprise import adbapi
 import time
 import MySQLdb.cursors
 import sys; sys.path.append("/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages")
 class BoxofficemojoPipeline(object):

     def __init__(self):
         print ('init')
         self.dbpool = adbapi.ConnectionPool('MySQLdb', db = 'testdb', user='testuser', passwd='test', cursorclass=MySQLdb.cursors.DictCursor, charset='utf8', use_unicode=True)



      def process_item(self, item, spider):
         print('process')
         query = self.dbpool.runInteraction(self._conditional_insert, item)  #("""INSERT INTO Example_Movie (title, url, gross, release) VALUES (%s, %s, %s, %s)""", (item['title'].endcode('utf-8'), item['url'].encode('utf-8'), item['gross'].encode('utf-8'), item['release'].encode('utf-8')))
          query.addErrback(self.handle_error)#self.conn.commit()

         return item

      def _conditional_insert(self, tx, item):
         print ('conditional insert')
          #Create record if doesn't exist
          #all this block run on it's own thread

         tx.execute("select * from example_movie where url = %s", (item['url'], ))
         result = tx.fetchone()
         if result:
              log.msg("Item already stored in db: %s" % item, level = log.DEBUG)
         else:
              tx.execute("insert into example_movie (title, url, gross, release) values (%s, %s, %s, %s)", (item['title'].encode('utf-8'), item['url'].encode('utf-8'), item['gross'].encode('utf-8'), item['release'].encode('utf-8')))
              log.msg("Item stored in db: %s" %  item, level=log.DEBUG)

      def handle_error(self, e):
         print ('handle_error')
         log.err(e)

1 个答案:

答案 0 :(得分:0)

感谢所有花时间阅读并查看此内容的人。我最后一次查看代码后发现了问题。我意识到,在我的管道代码中,问题import MySQLdb.cursors出现在import sys; sys.path.append("/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages")之前(MySQLdb module的位置)。在import sys; sys.path.append之前添加import MySQLdb语句可修复import error问题。