Question

我正在尝试将已删除的xml的输出写入json。由于项目不可序列化，刮擦失败。

从这个问题来看，它建议你需要建立一个管道，回答没有提供问题范围SO scrapy serializer

所以提到scrapy docs 它说明了一个例子，然而文档然后建议不要使用这个

JsonWriterPipeline的目的只是介绍如何编写项目管道。如果你真的想把所有被刮掉的物品存放到一个您应该使用Feed导出的JSON文件。

如果我去饲料出口，则会显示

JSON

FEED_FORMAT：使用的json导出器：JsonItemExporter如果是，请参阅此警告   你正在使用大型Feed的JSON。

我的问题仍然存在，因为据我所知，这是从命令行执行的。

scrapy runspider myxml.py -o ~/items.json -t json

然而，这会产生我想要使用管道来解决的错误。

TypeError: <bound method SelectorList.extract of [<Selector xpath='.//@venue' data=u'Royal Randwick'>]> is not JSON serializable

如何创建json管道以纠正json序列化错误？

这是我的代码。

# -*- coding: utf-8 -*-
import scrapy
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
from scrapy.selector import XmlXPathSelector
from conv_xml.items import ConvXmlItem
# https://stackoverflow.com/a/27391649/461887
import json


class MyxmlSpider(scrapy.Spider):
    name = "myxml"

    start_urls = (
        ["file:///home/sayth/Downloads/20160123RAND0.xml"]
    )

    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//meeting')
        items = []

        for site in sites:
            item = ConvXmlItem()
            item['venue'] = site.xpath('.//@venue').extract
            item['name'] = site.xpath('.//race/@id').extract()
            item['url'] = site.xpath('.//race/@number').extract()
            item['description'] = site.xpath('.//race/@distance').extract()
            items.append(item)

        return items


        # class JsonWriterPipeline(object):
        #
        #     def __init__(self):
        #         self.file = open('items.jl', 'wb')
        #
        #     def process_item(self, item, spider):
        #         line = json.dumps(dict(item)) + "\n"
        #         self.file.write(line)
        #         return item

Answer 1

问题在于：

item['venue'] = site.xpath('.//@venue').extract

您忘记致电extract了。替换为：

item['venue'] = site.xpath('.//@venue').extract()

项目的管道不是JSON可序列化的

1 个答案: