删除反斜杠,但将unicode保留在json字符串中

时间:2019-12-09 02:37:09

标签: json python-3.x

我有一个json字符串:

>>> a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
>>> type(a)
<class 'str'>

我想删除\\,但仍然保留Unicode转义序列。最终使用json.loads转换为python dict / list。我该怎么办?

尝试了三种方法,但是没有用:

  1. a.replace('\\', '')

    它可以删除'\',但是以某种方式我的unicode标记消失了。

    >>> a.replace('\\', '') result seems OK but lost the unicode notation
    '[{"pic": "QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97", "note": "u8aaau660e1", "location": "u6c34u6c60"}, {"pic": "QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP", "note": "u8aaau660e2", "location": "u6a4bu6a11"}]'
    
  2. json.loads(a)收到错误消息

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
    File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
    json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
    
  3. a.decode('utf-8')

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AttributeError: 'str' object has no attribute 'decode'
    

2 个答案:

答案 0 :(得分:0)

如果您只需要删除反斜杠并保留unicode:

import re

a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
print (a)
print ('\n')

b = re.sub(r'\\"', '"', a)
b = re.sub(r'\\\\u', r'\\u', b)
print (b)

它给出:

[{\"pic\": \"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\", \"note\": \"\\u8aaa\\u660e1\", \"location\": \"\\u6c34\\u6c60\"}, {\"pic\": \"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\", \"note\": \"\\u8aaa\\u660e2\", \"location\": \"\\u6a4b\\u6a11\"}]

[{"pic": "QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97", "note": "\u8aaa\u660e1", "location": "\u6c34\u6c60"}, {"pic": "QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP", "note": "\u8aaa\u660e2", "location": "\u6a4b\u6a11"}]

如果以后需要使用这些数据,则可能会有转换为json的问题,因为您有2个字典的数组。我会这样解决:

import json
import re

a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
print (a)

dictionaries = []

substrings_for_dictionaries = a.split(r'}, {')

for substring in substrings_for_dictionaries:
    substring = re.sub(r'[{}]', '', substring)
    substring = re.sub(r'[\[\]]', '', substring)
    substring = re.sub(r'\\"', '"', substring)
    substring = re.sub(r'\\\\u', r'\\u', substring)
    substring = '{' + substring + '}'
    dictionary = json.loads(substring)
    dictionaries.append(dictionary)


for dictionary in dictionaries:
    print (dictionary)

结果是:

[{\"pic\": \"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\", \"note\": \"\\u8aaa\\u660e1\", \"location\": \"\\u6c34\\u6c60\"}, {\"pic\": \"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\", \"note\": \"\\u8aaa\\u660e2\", \"location\": \"\\u6a4b\\u6a11\"}]
{'pic': 'QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97', 'note': '說明1', 'location': '水池'}
{'pic': 'QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP', 'note': '說明2', 'location': '橋樑'}

答案 1 :(得分:0)

就个人而言,我将使用提取字符串的语言的解析器,但是由于您没有提及,因此我使用Python的编解码器的字符串转义解码来完成这项工作。它适用于大多数情况,但在语言在支持的转义序列不同的情况下可能会中断。

import codecs
import json

s = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
unescaped = codecs.decode(s, 'unicode-escape')
obj = json.loads(unescaped)