我有一个需要发送的JSON文件。在发送之前,我需要进行有效性检查并替换一些特殊字符(空格和点(.
))。
问题是Python在每个字符串之前插入u
字符,服务器无法读取。如何删除u
字符并进行数据卫生(字符替换)?
原始JSON
{
"columns": [
{
"data": "Doc.",
"title": "Doc."
},
{
"data": "Order no.",
"title": "Order no."
},
{
"data": "Nothing",
"title": "Nothing"
}
],
"data": [
{
"Doc.": "564251422",
"Nothing": 0.0,
"Order no.": "56421"
},
{
"Doc.": "546546545",
"Nothing": 0.0,
"Order no.": "98745"
}
]
}
的Python:
import json
def func():
with open('json/simpledata.json', 'r') as json_file:
json_data = json.load(json_file)
print(json_data)
func()
输出JSON:
{u'data': [{u'Nothing': 0.0, u'Order no.': u'56421', u'Doc.': u'564251422'}, {u'Nothing': 0.0, u'Order no.': u'98745', u'Doc.': u'546546545'}], u'columns': [{u'data': u'Doc.', u'title': u'Doc.'}, {u'data': u'Order no.', u'title': u'Order no.'}, {u'data': u'Nothing', u'title': u'Nothing'}]}
我想用Python实现的目标:
sanitizeData: function(jsonArray) {
var newKey;
jsonArray.forEach(function(item) {
for (key in item) {
newKey = key.replace(/\s/g, '').replace(/\./g, '');
if (key != newKey) {
item[newKey] = item[key];
delete item[key];
}
}
})
return jsonArray;
},
# remove whitespace and dots from data : <propName> references
sanitizeColumns: function(jsonArray) {
var dataProp = [];
jsonArray.forEach(function(item) {
dataProp = item['data'].replace(/\s/g, '').replace(/\./g, '');
item['data'] = dataProp;
})
return jsonArray;
}
答案 0 :(得分:2)
要将JSON正确打印为字符串,请尝试
print(json.dumps(json_data))
另见https://docs.python.org/2/library/json.html#json.dumps
要从字符串中删除某些字符,您可以做一件显而易见的事情:
string = string.replace(".", "").replace(" ", "")
或更有效地使用str.translate(该示例仅适用于python 2):
string = string.translate(None, " .")
或使用正则表达式; re.sub:
import re
string = re.sub(r"[ .]", "", string)
然后只需使用一个很好的理解来遍历整个字典(使用items()
和python 3):
sanitize = lambda s: re.sub(r"[ .]", "", s)
table = {sanitize(k):sanitize(v) for k, v in table.iteritems()}
但这只适用于燕子词典。尽管如此,您的解决方案看起来并不适用于深层嵌套的结构。但是如果你需要它,那么一些递归(对于python 3使用items()
而不是iteritems()
和str
而不是basestring
):
def sanitize(value):
if isinstance(value, dict):
value = {sanitize(k):sanitize(v) for k, v in value.iteritems()}
elif isinstance(value, list):
value = [sanitize(v) for v in value]
elif isinstance(value, basestring):
value = re.sub(r"[ .]", "", value)
return value
table = sanitize(table)
答案 1 :(得分:1)
示例:
import json
json_d = json.load(open('data.json', 'r'))
json_d = json.dumps(json_d)
print(json_d)
答案 2 :(得分:1)
我也想改进@Felk和@jlaur的出色解决方案。
在我的情况下,Windows事件日志包含未知的控制字符,这些字符不能正确地定格。
这是我的版本,该版本删除了所有抽象控制字符,由于键入提示,它们与Python 3.6+兼容(可以删除以使其再次与python 3.x兼容)。
import re
from typing import Union
def json_sanitize(value: Union[str, dict, list], is_value=True) -> Union[str, dict, list]:
"""
Modified version of https://stackoverflow.com/a/45526935/2635443
Recursive function that allows to remove any special characters from json, especially unknown control characters
"""
if isinstance(value, dict):
value = {json_sanitize(k, False):json_sanitize(v, True) for k, v in value.items()}
elif isinstance(value, list):
value = [json_sanitize(v, True) for v in value]
elif isinstance(value, str):
if not is_value:
# Remove dots from value names
value = re.sub(r"[.]", "", value)
else:
# Remove all control characters
value = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', value)
return value
答案 3 :(得分:0)
我只想在@Felk的优秀解决方案中添加一个版本。我有一堆钥匙,里面有点。来自@Felk的解决方案从键中删除了点,但也从值中删除了 - 这是我不想要的。因此,对于像我这样的任何人来说,只需要清理密钥的解决方案就可以输入这篇文章了。
def sanitize(value, is_value=True):
if isinstance(value, dict):
value = {sanitize(k,False):sanitize(v,True) for k, v in value.items()}
elif isinstance(value, list):
value = [sanitize(v, True) for v in value]
elif isinstance(value, str):
if not is_value:
value = re.sub(r"[.]", "", value)
return value
table = sanitize(table)