我有一个json字符串:
>>> a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
>>> type(a)
<class 'str'>
我想删除\\
,但仍然保留Unicode转义序列。最终使用json.loads
转换为python dict / list。我该怎么办?
尝试了三种方法,但是没有用:
a.replace('\\', '')
它可以删除'\',但是以某种方式我的unicode标记消失了。
>>> a.replace('\\', '') result seems OK but lost the unicode notation
'[{"pic": "QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97", "note": "u8aaau660e1", "location": "u6c34u6c60"}, {"pic": "QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP", "note": "u8aaau660e2", "location": "u6a4bu6a11"}]'
json.loads(a)
收到错误消息
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
a.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
答案 0 :(得分:0)
如果您只需要删除反斜杠并保留unicode:
import re
a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
print (a)
print ('\n')
b = re.sub(r'\\"', '"', a)
b = re.sub(r'\\\\u', r'\\u', b)
print (b)
它给出:
[{\"pic\": \"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\", \"note\": \"\\u8aaa\\u660e1\", \"location\": \"\\u6c34\\u6c60\"}, {\"pic\": \"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\", \"note\": \"\\u8aaa\\u660e2\", \"location\": \"\\u6a4b\\u6a11\"}]
[{"pic": "QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97", "note": "\u8aaa\u660e1", "location": "\u6c34\u6c60"}, {"pic": "QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP", "note": "\u8aaa\u660e2", "location": "\u6a4b\u6a11"}]
如果以后需要使用这些数据,则可能会有转换为json的问题,因为您有2个字典的数组。我会这样解决:
import json
import re
a = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
print (a)
dictionaries = []
substrings_for_dictionaries = a.split(r'}, {')
for substring in substrings_for_dictionaries:
substring = re.sub(r'[{}]', '', substring)
substring = re.sub(r'[\[\]]', '', substring)
substring = re.sub(r'\\"', '"', substring)
substring = re.sub(r'\\\\u', r'\\u', substring)
substring = '{' + substring + '}'
dictionary = json.loads(substring)
dictionaries.append(dictionary)
for dictionary in dictionaries:
print (dictionary)
结果是:
[{\"pic\": \"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\", \"note\": \"\\u8aaa\\u660e1\", \"location\": \"\\u6c34\\u6c60\"}, {\"pic\": \"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\", \"note\": \"\\u8aaa\\u660e2\", \"location\": \"\\u6a4b\\u6a11\"}]
{'pic': 'QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97', 'note': '說明1', 'location': '水池'}
{'pic': 'QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP', 'note': '說明2', 'location': '橋樑'}
答案 1 :(得分:0)
就个人而言,我将使用提取字符串的语言的解析器,但是由于您没有提及,因此我使用Python的编解码器的字符串转义解码来完成这项工作。它适用于大多数情况,但在语言在支持的转义序列不同的情况下可能会中断。
import codecs
import json
s = '[{\\\"pic\\\": \\\"QmdYSopPxh46rQ5MjyMK5uw2sBKYVwjUNVoyKFYHb1cR97\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e1\\\", \\\"location\\\": \\\"\\\\u6c34\\\\u6c60\\\"}, {\\\"pic\\\": \\\"QmdNGrc1S9paXycnH7ogdB8w7qDUcWnEFJMPe1Wfb9fYyP\\\", \\\"note\\\": \\\"\\\\u8aaa\\\\u660e2\\\", \\\"location\\\": \\\"\\\\u6a4b\\\\u6a11\\\"}]'
unescaped = codecs.decode(s, 'unicode-escape')
obj = json.loads(unescaped)