在Scrapy的JSON导出中启用重音?

时间:2017-01-30 11:02:28

标签: css json scrapy

我使用的是Python 2.7和Scrapy 1.3 我的Scrapy代码是:

import scrapy

class CinemaSpider(scrapy.Spider):
    name = "cinema"
    allowed_domains = ['cineroxy.com.br']
    start_urls = [
        'http://cineroxy.com.br/programacao-brisamar',
    ]

    def parse(self, response):
        movie_names = response.css('.titulo p::text').extract()
        for movie_name in movie_names:
            yield {
                'name': movie_name.strip()
            }

我这样执行:

C:\Python27\Scripts>scrapy runspider cinema_scraper.py -o movies.json

结果:

[
{"name": "A Bailarina"},
{"name": "Assassins Creed - O Filme"},
{"name": "Cinquenta Tons Mais Escuros"},
{"name": "Minha M\u00e3e \u00e9 uma Pe\u00e7a 2"},
{"name": "Moana - Um Mar de Aventura"},
{"name": "Os Penetras 2 - Quem D\u00e1 Mais?"},
{"name": "Quatro Vidas de Um Cachorro"},
{"name": "Resident Evil 6: O \u00daltimo Cap\u00edtulo"},
{"name": "xXx: Reativado"}
]

如何修复

中的重音符号
Minha M\u00e3e \u00e9 uma Pe\u00e7a 2
Os Penetras 2 - Quem D\u00e1 Mais?
Resident Evil 6: O \u00daltimo Cap\u00edtulo

提前致谢..

2 个答案:

答案 0 :(得分:1)

使用FEED_EXPORT_ENCODING选项:

FEED_EXPORT_ENCODING = 'utf-8'

您可以在settings.py或custom_settings spider属性中或通过命令行设置它:

scrapy runspider cinema_scraper.py -s FEED_EXPORT_ENCODING=utf8 -o movies.json

答案 1 :(得分:0)

我们可以使用FEED_EXPORT_ENCODING='utf-8',但是使用此选项及其不带空格的值非常重要:

scrapy runspider cinema_scraper.py -s FEED_EXPORT_ENCODING='utf-8' -o movies.json