Python:在发送之前使用JSON清理数据

时间:2015-11-24 17:41:04

标签: python json string replace char

我有一个需要发送的JSON文件。在发送之前,我需要进行有效性检查并替换一些特殊字符(空格和点(.))。

问题是Python在每个字符串之前插入u字符,服务器无法读取。如何删除u字符并进行数据卫生(字符替换)?

原始JSON

{
    "columns": [
        {
            "data": "Doc.",
            "title": "Doc."
        },
        {
            "data": "Order no.",
            "title": "Order no."
        },
        {
            "data": "Nothing",
            "title": "Nothing"
        }
    ],
    "data": [
        {
            "Doc.": "564251422",
            "Nothing": 0.0,
            "Order no.": "56421"
        },
        {
            "Doc.": "546546545",
            "Nothing": 0.0,
            "Order no.": "98745"
        }
    ]
}

的Python:

import json
def func():
    with open('json/simpledata.json', 'r') as json_file:
        json_data = json.load(json_file)
        print(json_data)
func()

输出JSON:

{u'data': [{u'Nothing': 0.0, u'Order no.': u'56421', u'Doc.': u'564251422'}, {u'Nothing': 0.0, u'Order no.': u'98745', u'Doc.': u'546546545'}], u'columns': [{u'data': u'Doc.', u'title': u'Doc.'}, {u'data': u'Order no.', u'title': u'Order no.'}, {u'data': u'Nothing', u'title': u'Nothing'}]}

我想用Python实现的目标:

    sanitizeData: function(jsonArray) {
        var newKey;
        jsonArray.forEach(function(item) {
            for (key in item) {
                newKey = key.replace(/\s/g, '').replace(/\./g, '');
                if (key != newKey) {
                    item[newKey] = item[key];
                    delete item[key];
                }
            }
        })
        return jsonArray;
    },
    # remove whitespace and dots from data : <propName> references
    sanitizeColumns: function(jsonArray) {
        var dataProp = [];
        jsonArray.forEach(function(item) {
            dataProp = item['data'].replace(/\s/g, '').replace(/\./g, '');
            item['data'] = dataProp;
        })
        return jsonArray;
    }

4 个答案:

答案 0 :(得分:2)

要将JSON正确打印为字符串,请尝试 print(json.dumps(json_data))

另见https://docs.python.org/2/library/json.html#json.dumps

要从字符串中删除某些字符,您可以做一件显而易见的事情:

string = string.replace(".", "").replace(" ", "")

或更有效地使用str.translate(该示例仅适用于python 2):

string = string.translate(None, " .")

或使用正则表达式; re.sub

import re
string = re.sub(r"[ .]", "", string)

然后只需使用一个很好的理解来遍历整个字典(使用items()和python 3):

sanitize = lambda s: re.sub(r"[ .]", "", s)
table = {sanitize(k):sanitize(v) for k, v in table.iteritems()}

但这只适用于燕子词典。尽管如此,您的解决方案看起来并不适用于深层嵌套的结构。但是如果你需要它,那么一些递归(对于python 3使用items()而不是iteritems()str而不是basestring):

def sanitize(value):
    if isinstance(value, dict):
        value = {sanitize(k):sanitize(v) for k, v in value.iteritems()}
    elif isinstance(value, list):
        value = [sanitize(v) for v in value]
    elif isinstance(value, basestring):
        value = re.sub(r"[ .]", "", value)
    return value
table = sanitize(table)

答案 1 :(得分:1)

示例:

 import json

 json_d = json.load(open('data.json', 'r'))
 json_d = json.dumps(json_d)
 print(json_d)

答案 2 :(得分:1)

我也想改进@Felk和@jlaur的出色解决方案。

在我的情况下,Windows事件日志包含未知的控制字符,这些字符不能正确地定格。

这是我的版本,该版本删除了所有抽象控制字符,由于键入提示,它们与Python 3.6+兼容(可以删除以使其再次与python 3.x兼容)。

import re
from typing import Union

def json_sanitize(value: Union[str, dict, list], is_value=True) -> Union[str, dict, list]:
    """
    Modified version of https://stackoverflow.com/a/45526935/2635443

    Recursive function that allows to remove any special characters from json, especially unknown control characters
    """
    if isinstance(value, dict):
        value = {json_sanitize(k, False):json_sanitize(v, True) for k, v in value.items()}
    elif isinstance(value, list):
        value = [json_sanitize(v, True) for v in value]
    elif isinstance(value, str):
        if not is_value:
            # Remove dots from value names
            value = re.sub(r"[.]", "", value)
        else:
            # Remove all control characters
            value = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', value)
    return value

答案 3 :(得分:0)

我只想在@Felk的优秀解决方案中添加一个版本。我有一堆钥匙,里面有点。来自@Felk的解决方案从键中删除了点,但也从值中删除了 - 这是我不想要的。因此,对于像我这样的任何人来说,只需要清理密钥的解决方案就可以输入这篇文章了。

def sanitize(value, is_value=True):
    if isinstance(value, dict):
        value = {sanitize(k,False):sanitize(v,True) for k, v in value.items()}
    elif isinstance(value, list):
        value = [sanitize(v, True) for v in value]
    elif isinstance(value, str):
        if not is_value:
            value = re.sub(r"[.]", "", value)
    return value

table = sanitize(table)