Post Scrapy将结果返回给Parse

时间:2016-03-16 16:15:45

标签: parse-platform scrapy scrapy-pipeline

我做了一个管道

PARSE ='api.parse.com' PORT = 443

但是,我找不到在Parse中发布数据的正确方法。因为每次它在我的Parse DB中创建未定义的对象。

 class Newscrawlbotv01Pipeline(object):
    def process_item(self, item, spider):
        for data in item:
            if not data:
                raise DropItem("Missing data!")
        connection = httplib.HTTPSConnection(
            settings['PARSE'],
            settings['PORT']
        )
        connection.connect()
        connection.request('POST', '/1/classes/articlulos', json.dumps({item}), {
       "X-Parse-Application-Id": "XXXXXXXXXXXXXXXX",
       "X-Parse-REST-API-Key": "XXXXXXXXXXXXXXXXXXX",
       "Content-Type": "application/json"
     })
        log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider)
        return item
        #self.collection.update({'url': item['url']}, dict(item), upsert=True)

错误示例:

    2016-03-16 20:13:19 [scrapy] ERROR: Error processing {'image': 'http://eedl.eodi.org/wp-content/uploads/sites/3/2016/01/Figaro.png',
 'language': 'FR',
 'publishedDate': u'2016-03-16T18:52:24+01:00',
 'publisher': 'Le Figaro',
 'theme': 'Actualites',
 'title': u'Interpellations Paris: \xable niveau de menace reste tr\xe8s \xe9lev\xe9\xbb selon Hollande',
 'url': u'http://www.lefigaro.fr/flash-actu/2016/03/16/97001-20160316FILWWW00315-interpellations-paris-la-menace-reste-tres-elevee-selon-hollande.php'}
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "C:\Users\simon\Documents\NewsSwipe\PROTOTYPE\v0.1\NewsCrawlBotV0_1\NewsCrawlBotV0_1\pipelines.py", line 49, in process_item
    connection.request('POST', '/1/classes/articlulos', json.dumps({data}), {
  File "c:\python27\lib\json\__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "c:\python27\lib\json\encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "c:\python27\lib\json\encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "c:\python27\lib\json\encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set(['theme']) is not JSON serializable

2 个答案:

答案 0 :(得分:0)

您需要使用Pipeline,它将处理其process_item方法中的所有输出项目,您可以使用该项目执行任何操作。

答案 1 :(得分:0)

Scrapy有一个用于JSON文件的内置Feed导出器,您需要做的就是添加

-o example.json

到你的scrapy命令行。请参阅the docs here