尝试使用Scrapy将数据输出到MySQL表时返回'encode'错误

时间:2013-11-13 02:00:39

标签: python mysql scrapy

我是Python和Scrapy的新手,并尝试将已爬网的数据输出到我的MySQL数据库,但我遇到了以下错误;

exceptions.AttributeError: 'list' object has no attribute 'encode'

这是我的管道代码;

import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):
    def __init__(self):
        self.conn = MySQLdb.connect(user='User', passwd='passwd', db='db', host='host', charset="utf8", use_unicode=True)
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):    
        try:
            self.cursor.execute("""INSERT INTO Teams (Country, CountryFlagLink, TeamWikiURL, MethodOfQualification, DateOfQualification, FinalsAppearance, LastAppearance, PreviousBestPerformance, FifaRankingAsOfOct2013)  
                        VALUES (%s, %s)""", 
                       (item['Country'].encode('utf-8'),
                        item['CountryFlagLink'].encode('utf-8'),
                        item['TeamWikiURL'].encode('utf-8'),
                        item['MethodOfQualification'].encode('utf-8'),
                        item['DateOfQualification'].encode('utf-8'),
                        item['FinalsAppearance'].encode('utf-8'),
                        item['LastAppearance'].encode('utf-8'),
                        item['PreviousBestPerformance'].encode('utf-8'),
                        item['FifaRankingAsOfOct2013'].encode('utf-8')))

            self.conn.commit()


        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])

        return item

在我抓取网站并尝试将数据导入MySQL数据库之后,这里是完整的堆栈跟踪;

ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] ERROR: Error processing {'Country': [u'Ecuad
r'],
         'CountryFlagLink': [u'//upload.wikimedia.org/wikipedia/commons/thumb/e
e8/Flag_of_Ecuador.svg/23px-Flag_of_Ecuador.svg.png'],
         'DateOfQualification': [u'15 October 2013'],
         'FifaRankingAsOfOct2013': [u'22'],
         'FinalsAppearance': [u'3rd'],
         'LastAppearance': [u'2006'],
         'MethodOfQualification': [u'CONMEBOL Round Robin 4th place'],
         'PreviousBestPerformance': [u'Round of 16 (2006)'],
         'TeamWikiURL': [u'/wiki/Ecuador_national_football_team']}
        Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\mi
dleware.py", line 62, in _process_chain
            return process_chain(self.methods[methodname], obj, *args)
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\ut
ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] ERROR: Error processing {'Country': [u'Hondu
as'],
         'CountryFlagLink': [u'//upload.wikimedia.org/wikipedia/commons/thumb/8
82/Flag_of_Honduras.svg/23px-Flag_of_Honduras.svg.png'],
         'DateOfQualification': [u'15 October 2013'],
         'FifaRankingAsOfOct2013': [u'34'],
         'FinalsAppearance': [u'3rd'],
         'LastAppearance': [u'2010'],
         'MethodOfQualification': [u'CONCACAF Fourth Round 3rd place'],
         'PreviousBestPerformance': [u'Group stage (1982, 2010)'],
         'TeamWikiURL': [u'/wiki/Honduras_national_football_team']}
        Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\mi
dleware.py", line 62, in _process_chain
            return process_chain(self.methods[methodname], obj, *args)
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\ut
ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] INFO: Closing spider (finished)
2013-11-12 19:36:33-0600 [wikitut] INFO: Dumping Scrapy stats:
        {'downloader/request_bytes': 246,
         'downloader/request_count': 1,
         'downloader/request_method_count/GET': 1,
         'downloader/response_bytes': 72797,
         'downloader/response_count': 1,
         'downloader/response_status_count/200': 1,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2013, 11, 13, 1, 36, 33, 840000),
         'log_count/DEBUG': 7,
         'log_count/ERROR': 22,
         'log_count/INFO': 3,
         'response_received_count': 1,
         'scheduler/dequeued': 1,
         'scheduler/dequeued/memory': 1,

我有一个包含所有必填字段(所有varchar)的MySQL数据库设置并设置为整理:utf8_general_ci。我迷失了为什么我得到了上面提到的错误。有些人可以向我解释一下我做错了吗?

1 个答案:

答案 0 :(得分:2)

根据您的错误消息,它似乎是item['Country']列表,其中包含1个元素。见Country': [u'Honduas']

所以你需要像这样编辑:

(item['Country'][0].encode('utf-8'),
item['CountryFlagLink'][0].encode('utf-8'),
item['TeamWikiURL'][0].encode('utf-8'),
item['MethodOfQualification'][0].encode('utf-8'),
item['DateOfQualification'][0].encode('utf-8'),
item['FinalsAppearance'][0].encode('utf-8'),
item['LastAppearance'][0].encode('utf-8'),
item['PreviousBestPerformance'][0].encode('utf-8'),
item['FifaRankingAsOfOct2013'][0].encode('utf-8')))

我不是Python用户,所以也许我错了。