如何摆脱“名称”子字段中嵌套的双引号?

时间:2019-02-08 17:00:35

标签: python json regex double-quotes

我正在尝试使用Python json包将以下字符串读入字典

但是,在子字段“名称”下有一个带有嵌套双引号的描述。我的json无法以这种方式读取字符串

import json 

string1 = 
'{"id":17033,"project_id":17033,"state":"active","state_changed_at":1488054590,"name":"a.k.a.:\xa0"The Sunshine Makers""'

json.loads(string1)

引发错误

JSONDecodeError: Expecting ',' delimiter: line 1 column 96 (char 95)

我知道此错误的原因是由于“阳光制造者”周围的嵌套双引号引起的。

如何摆脱双引号?

更多导致错误的字符串示例

string2 = '{"id":960066,"project_id":960066,"state":"active","state_changed_at":1502049940,"name":"New J. Lye Album - Behind The Lyes","blurb":"I am working on my new project titled "Behind The Lyes" which is coming out fall of 2017."'

#The problem with this string comes from the nested double quote around the pharse "Behind The Lyes inside" the 'blurb' subfield 

1 个答案:

答案 0 :(得分:0)

请注意,您的字符串有多个问题,使其无效JSON

您看到的错误是\xa0(不间断空格)。在""问题成为问题之前,必须解决该问题。

您的字符串缺少结尾}

也就是说,对于您首先引用的字符串,一种解决问题的方法是使用.replace()

string1 = '{"id":17033,"project_id":17033,"state":"active","state_changed_at":1488054590,"name":"a.k.a.:\xa0"The Sunshine Makers""'.replace('\xa0"', "'").replace('""', "'\"") + '}'

例如,以下内容处理了两个示例字符串中的双引号和其他问题:

import json 

fixes = [('\xa0', ' '),('"',"'"),("{'",'{"'),("','", '","'),(",'", ',"'),("':'", '":"'),("':", '":'),("''", '\'\"'), ("'}",'"}')]

print(fixes)
string1 = '{"id":17033,"project_id":17033,"state":"active","state_changed_at":1488054590,"name":"a.k.a.:\xa0"The Sunshine Makers""'
string2 = '{"id":960066,"project_id":960066,"state":"active","state_changed_at":1502049940,"name":"New J. Lye Album - Behind The Lyes","blurb":"I am working on my new project titled "Behind The Lyes" which is coming out fall of 2017."'
strings = [string1, string2]

for string in strings:
    print(string)
    string = string + '}'
    for fix in fixes:
        string = string.replace(*fix)
    print(string)
    print(json.loads(string)['name'])

如果您可以使用从中检索这些字符串的代码或文件来填写问题,将很有帮助。这样就可以给出更全面的答案。