Question

我正在从Python 2切换到3

在我的jupyter笔记本中，代码是

file = "./data/test.json" 
with open(file) as data_file:    
    data = json.load(data_file)

以前用python 2很好，但现在只需切换到python 3，就会给我错误

UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 123: illegal multibyte sequence

test.json文件是这样的：

[{
    "name": "Daybreakers",
    "detail_url": "http://www.movieinsider.com/m4120/daybreakers/",
    "movie_tt_id": "中文"
  }]

如果删除中文，则不会有错误。

那我该怎么办？

在SO中有很多类似的问题，但我找不到适合我案例的好方法。如果你找到适用的，请告诉我，我会关闭这个。

非常感谢！

Answer 1

打开文件时需要指定正确的编码。如果JSON使用UTF-8编码，您可以这样做：

import json

fname = "test.json" 
with open(fname, encoding='utf-8') as data_file:    
    data = json.load(data_file)

print(data)

<强>输出

[{'name': 'Daybreakers', 'detail_url': 'http://www.movieinsider.com/m4120/daybreakers/', 'movie_tt_id': '中文'}]