为什么我会收到SyntaxError :( unicode error)' utf-8'编解码器不能解码位置0中的字节0x96:无效的起始字节

时间:2015-04-17 23:37:41

标签: python json unicode utf-8

我从API获得了一些json数据。我使用了json.loads然后将它打印到REPL,如下所示。

  {'warnings': {'query': {'*': "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}}, 'query-continue': {'links': {'plcontinue': '25618423|10|R_from_other_capitalisation', 'gplcontinue': "15095968|0|1991_US_Open_-_Women's_Doubles"}}, 'query': {'pages': {'32203010': {'pageid': 32203010, 'title': "1988 Australian Open - Women's Doubles", 'ns': 0}, '25618558': {'pageid': 25618558, 'title': "1984 Wimbledon Championships - Women's Singles", 'ns': 0}, '29486043': {'pageid': 29486043, 'title': "1984 Wimbledon Championships - Women's Doubles", 'ns': 0}, '25618819': {'pageid': 25618819, 'title': "1986 US Open - Women's Singles", 'ns': 0}, '25619314': {'pageid': 25619314, 'title': "1989 US Open - Women's Singles", 'ns': 0}, '25618668': {'pageid': 25618668, 'title': "1985 US Open - Women's Singles", 'ns': 0}, '25618857': {'pageid': 25618857, 'title': "1987 Australian Open - Women's Singles", 'ns': 0}, '25618423': {'links': [{'title': "1983 Wimbledon Championships – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}], 'pageid': 25618423, 'title': "1983 Wimbledon Championships - Women's Singles", 'ns': 0}, '23826062': {'links': [{'title': "1984 French Open – Women's Singles", 'ns': 0}, {'title': 'Wikipedia:Mainspace', 'ns': 4}, {'title': 'Template:R from long name', 'ns': 10}, {'title': 'Template:R from other capitalisation', 'ns': 10}, {'title': 'Template:R from plural', 'ns': 10}, {'title': 'Template:R from short name', 'ns': 10}, {'title': 'Category:Redirects from modifications', 'ns': 14}], 'pageid': 23826062, 'title': "1984 French Open - Women's Singles", 'ns': 0}, '25619177': {'pageid': 25619177, 'title': "1989 Australian Open - Women's Singles", 'ns': 0}}}}

然后我将数据从repl复制到.py模块并分配给变量,以便我可以执行一些单元测试。但我一直收到这个错误:

SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte

发生了什么事?

更新:我收到错误的确切方式。使用Visual Studio我运行了一个脚本,该脚本使用Requests和.text来获取内容。然后我应用了json.loads。我将它打印到Visual Studio Python 3.4 Interactive(又名REPL)。然后我使用这个REPL中的鼠标复制并粘贴到Visual Studio中的.py文件中。

更新2:所以当我抓取数据时,我使用Requests,然后使用text属性。当我打印这个没有json.loads罚款。但是,如果我复制这个"更多原始"从REPL中粘贴并不是它不再是一个字符串,而是一个对象和JSON加载不会起作用。 python 3打印函数是否打印对象,即使它应该是json?

这是使用Requests.text的原始no json.loads输出:

{"warnings":{"query":{"*":"Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."}},"query-continue":{"links":{"plcontinue":"25618423|10|R_from_other_capitalisation","gplcontinue":"15095968|0|1991_US_Open_-_Women's_Doubles"}},"query":{"pages":{"25618423":{"pageid":25618423,"ns":0,"title":"1983 Wimbledon Championships - Women's Singles","links":[{"ns":0,"title":"1983 Wimbledon Championships \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"}]},"23826062":{"pageid":23826062,"ns":0,"title":"1984 French Open - Women's Singles","links":[{"ns":0,"title":"1984 French Open \u2013 Women's Singles"},{"ns":4,"title":"Wikipedia:Mainspace"},{"ns":10,"title":"Template:R from long name"},{"ns":10,"title":"Template:R from other capitalisation"},{"ns":10,"title":"Template:R from plural"},{"ns":10,"title":"Template:R from short name"},{"ns":14,"title":"Category:Redirects from modifications"}]},"29486043":{"pageid":29486043,"ns":0,"title":"1984 Wimbledon Championships - Women's Doubles"},"25618558":{"pageid":25618558,"ns":0,"title":"1984 Wimbledon Championships - Women's Singles"},"25618668":{"pageid":25618668,"ns":0,"title":"1985 US Open - Women's Singles"},"25618819":{"pageid":25618819,"ns":0,"title":"1986 US Open - Women's Singles"},"25618857":{"pageid":25618857,"ns":0,"title":"1987 Australian Open - Women's Singles"},"32203010":{"pageid":32203010,"ns":0,"title":"1988 Australian Open - Women's Doubles"},"25619177":{"pageid":25619177,"ns":0,"title":"1989 Australian Open - Women's Singles"},"25619314":{"pageid":25619314,"ns":0,"title":"1989 US Open - Women's Singles"}}}}

1 个答案:

答案 0 :(得分:4)

您的文字中有EN DASH(U + 2013)个字符。在Windows-1252编解码器中,它们映射到字节\x96。您遇到了编码问题,但具体原因取决于您将文本复制到.py文件所采取的步骤。我将问题中的文本剪切并粘贴到Notepad ++中,编码设置为ANSI并将其分配给变量并简单地得到:

  File "C:\temp.py", line 1
SyntaxError: unknown decode error

但选择UTF-8UTF-8 without BOM作为编码,它可以正常工作。如果没有#coding:注释声明源编码,Python 3将采用UTF-8。

请注意,我的美国Windows系统上的ANSI确实是Windows-1252。使用ANSI和添加#coding:windows-1252也可以正常使用。 Python需要知道源代码编码,如果它与默认值不同(Python 2上的ascii和Python 3上的utf-8。)