Question

请考虑以下代码：

>>> import json
>>> data = {
...     'x': [1, {'$special': 'a'}, 2],
...     'y': {'$special': 'b'},
...     'z': {'p': True, 'q': False}
... }
>>> print(json.dumps(data, indent=2))
{
  "y": {
    "$special": "b"
  },
  "z": {
    "q": false,
    "p": true
  },
  "x": [
    1,
    {
      "$special": "a"
    },
    2
  ]
}

我想要的是格式化JSON，以便只有一个属性'$special'的JSON对象在一行上呈现，如下所示。

{
  "y": {"$special": "b"},
  "z": {
    "q": false,
    "p": true
  },
  "x": [
    1,
    {"$special": "a"},
    2
  ]
}

我已经开始实现自定义JSONEncoder并将其作为json.dumps参数传递给cls，但JSONEncoder上的两个方法都有问题：

为JSONEncoder的每个部分调用data default方法，但返回值不是原始JSON字符串，因此似乎没有调整格式的方法。
JSONEncoder encode方法会返回一个原始JSON字符串，但只对整个data调用一次。

我有什么方法可以JSONEncoder做我想做的事吗？

Answer 1

json模块的设计并不是为了让您对输出有太多的控制权。缩进主要是为了在调试时提供可读性。

不是让json产生输出，而是使用标准库tokenize module 转换输出：

import tokenize
from io import BytesIO


def inline_special(json_data):
    def adjust(t, ld,):
        """Adjust token line number by offset"""
        (sl, sc), (el, ec) = t.start, t.end
        return t._replace(start=(sl + ld, sc), end=(el + ld, ec))

    def transform():
        with BytesIO(json_data.encode('utf8')) as b:
            held = []  # to defer newline tokens
            lastend = None  # to track the end pos of the prev token
            loffset = 0     # line offset to adjust tokens by
            tokens = tokenize.tokenize(b.readline)
            for tok in tokens:
                if tok.type == tokenize.NL:
                    # hold newlines until we know there's no special key coming
                    held.append(adjust(tok, loffset))
                elif (tok.type == tokenize.STRING and
                        tok.string == '"$special"'):
                    # special string, collate tokens until the next rbrace
                    # held newlines are discarded, adjust the line offset
                    loffset -= len(held)
                    held = []
                    text = [tok.string]
                    while tok.exact_type != tokenize.RBRACE:
                        tok = next(tokens)
                        if tok.type != tokenize.NL:
                            text.append(tok.string)
                            if tok.string in ':,':
                                text.append(' ')
                        else:
                            loffset -= 1  # following lines all shift
                    line, col = lastend
                    text = ''.join(text)
                    endcol = col + len(text)
                    yield tokenize.TokenInfo(
                        tokenize.STRING, text, (line, col), (line, endcol),
                        '')
                    # adjust any remaining tokens on this line
                    while tok.type != tokenize.NL:
                        tok = next(tokens)
                        yield tok._replace(
                            start=(line, endcol),
                            end=(line, endcol + len(tok.string)))
                        endcol += len(tok.string)
                else:
                    # uninteresting token, yield any held newlines
                    if held:
                        yield from held
                        held = []
                    # adjust and remember last position
                    tok = adjust(tok, loffset)
                    lastend = tok.end
                    yield tok

    return tokenize.untokenize(transform()).decode('utf8')

这会成功重新格式化您的样本：

import json

data = {
    'x': [1, {'$special': 'a'}, 2],
    'y': {'$special': 'b'},
    'z': {'p': True, 'q': False}
}

>>> print(inline_special(json.dumps(data, indent=2)))
{
  "x": [
    1,
    {"$special": "a"},
    2
  ],
  "y": {"$special": "b"},
  "z": {
    "p": true,
    "q": false
  }
}

Answer 2

我发现以下基于正则表达式的解决方案最简单，尽管...... 基于正则表达式。

import json
import re
data = {
    'x': [1, {'$special': 'a'}, 2],
    'y': {'$special': 'b'},
    'z': {'p': True, 'q': False}
}
text = json.dumps(data, indent=2)
pattern = re.compile(r"""
{
\s*
"\$special"
\s*
:
\s*
"
((?:[^"]|\\"))*  # Captures zero or more NotQuote or EscapedQuote
"
\s*
}
""", re.VERBOSE)
print(pattern.sub(r'{"$special": "\1"}', text))

输出如下。

{
  "x": [
    1,
    {"$special": "a"},
    2
  ],
  "y": {"$special": "b"},
  "z": {
    "q": false,
    "p": true
  }
}

Answer 3

您可以这样做，但您基本上必须从_make_iterencode复制/修改很多代码，因为编码功能并非真正被设计为部分覆盖。

基本上，从json.encoder复制整个_make_iterencode并进行更改，以便在没有换行缩进的情况下打印特殊字典。然后monkeypatch json包使用你的修改版本，运行json转储，然后撤消monkeypatch（如果你想）。

import json import json.encoder def _make_iterencode(markers, _default, _encoder, _indent, _floatstr, ... def _iterencode_dict(dct, _current_indent_level): ... if _indent is not None: _current_indent_level += 1 if '$special' in dct: newline_indent = '' item_separator = _item_separator else: newline_indent = '\n' + (' ' * (_indent * _current_indent_level)) item_separator = _item_separator + newline_indent yield newline_indent ... if newline_indent is not None: _current_indent_level -= 1 if '$special' not in dct: yield '\n' + (' ' * (_indent * _current_indent_level)) def main(): data = { 'x': [1, {'$special': 'a'}, 2], 'y': {'$special': 'b'}, 'z': {'p': True, 'q': False}, } orig_make_iterencoder = json.encoder._make_iterencode json.encoder._make_iterencode = _make_iterencode print(json.dumps(data, indent=2)) json.encoder._make_iterencode = orig_make_iterencoder功能很长，所以我只发布了需要更改的部分。

{{1}}

在一行上格式化某些JSON对象

3 个答案: