Question

我有一个json字符串：

>>> a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
>>> type(a)
<class 'str'>

我想删除\\，但仍然保留Unicode转义序列。最终使用json.loads转换为python dict / list。我该怎么办？

尝试了三种方法，但是没有用：

a.replace('\\', '')

它可以删除'\'，但是以某种方式我的unicode标记消失了。

>>> a.replace('\\', '') result seems OK but lost the unicode notation
'[{"pic": "QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97", "note": "u8aaau660e1", "location": "u6c34u6c60"}, {"pic": "QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP", "note": "u8aaau660e2", "location": "u6a4bu6a11"}]'

json.loads(a)收到错误消息

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

a.decode('utf-8')

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

Answer 1

如果您只需要删除反斜杠并保留unicode：

import re

a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
print (a)
print ('\n')

b = re.sub(r'\\"', '"', a)
b = re.sub(r'\\\\u', r'\\u', b)
print (b)

它给出：

[{\"pic\": \"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\", \"note\": \"\\u8aaa\\u660e1\", \"location\": \"\\u6c34\\u6c60\"}, {\"pic\": \"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\", \"note\": \"\\u8aaa\\u660e2\", \"location\": \"\\u6a4b\\u6a11\"}]

[{"pic": "QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97", "note": "\u8aaa\u660e1", "location": "\u6c34\u6c60"}, {"pic": "QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP", "note": "\u8aaa\u660e2", "location": "\u6a4b\u6a11"}]

如果以后需要使用这些数据，则可能会有转换为json的问题，因为您有2个字典的数组。我会这样解决：

import json
import re

a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
print (a)

dictionaries = []

substrings_for_dictionaries = a.split(r'}, {')

for substring in substrings_for_dictionaries:
    substring = re.sub(r'[{}]', '', substring)
    substring = re.sub(r'[\[\]]', '', substring)
    substring = re.sub(r'\\"', '"', substring)
    substring = re.sub(r'\\\\u', r'\\u', substring)
    substring = '{' + substring + '}'
    dictionary = json.loads(substring)
    dictionaries.append(dictionary)


for dictionary in dictionaries:
    print (dictionary)

结果是：

[{\"pic\": \"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\", \"note\": \"\\u8aaa\\u660e1\", \"location\": \"\\u6c34\\u6c60\"}, {\"pic\": \"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\", \"note\": \"\\u8aaa\\u660e2\", \"location\": \"\\u6a4b\\u6a11\"}]
{'pic': 'QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97', 'note': '說明1', 'location': '水池'}
{'pic': 'QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP', 'note': '說明2', 'location': '橋樑'}

Answer 2

就个人而言，我将使用提取字符串的语言的解析器，但是由于您没有提及，因此我使用Python的编解码器的字符串转义解码来完成这项工作。它适用于大多数情况，但在语言在支持的转义序列不同的情况下可能会中断。

import codecs
import json

s = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
unescaped = codecs.decode(s, 'unicode-escape')
obj = json.loads(unescaped)

删除反斜杠，但将unicode保留在json字符串中

2 个答案: