Question

我有一个数据帧df

df:
col1    col2  col3
 1        2     3
 4        5     6
 7        8     9

我正在寻找的json是：

 {
            "col1": 1,
            "col1": 4,
            "col1": 7,
        },
        {
            "col2": 2,
            "col2": 5,
            "col2": 8
        },
        {
            "col3": 3,
            "col3": 6,
            "col3": 9,
        }

我尝试过df.to_json，但不起作用

df.to_json(orients=records)
it gives this output
'[{"col1":1,"col2":2,"col3":3},{"col1":4,"col2":5,"col3":6}, 
 {"col1":7,"col2":8,"col3":9}]

这不是我想要的输出

如何使用pandas / python以最有效的方式做到这一点？

Answer 1

您需要做

df.to_json('file.json', orient='records')

请注意，这将为您提供一系列对象：

[
        {
            "col1": 1,
            "col1": 4,
            "col1": 7
        },
        {
            "col2": 2,
            "col2": 5,
            "col2": 8
        },
        {
            "col3": 3,
            "col3": 6,
            "col3": 9
        }
]

您也可以

df.to_json('file.json', orient='records', lines=True)

如果您希望输出如下：

{"col1":1,"col1":4,"col1":7},
{"col2":2,"col2":5,"col2":8},
{"col3":3,"col3":6,"col3":9}

要美化输出：

pip install jq
cat file.json | jq '.' > new_file.json

Answer 2

JSON文件在python中被视为字典，您指定的JSON文件具有重复的键，并且只能解析为字符串（并且不使用python json库）。以下代码：

import json
from io import StringIO

df = pd.DataFrame(np.arange(1,10).reshape((3,3)), columns=['col1','col2','col3'])
io = StringIO()
df.to_json(io, orient='columns')
parsed = json.loads(io.getvalue())
with open("pretty.json", '+w') as of:
    json.dump(parsed, of, indent=4)

将产生以下JSON：

{
    "col1": {
        "0": 1,
        "1": 4,
        "2": 7
    },
    "col2": {
        "0": 2,
        "1": 5,
        "2": 8
    },
    "col3": {
        "0": 3,
        "1": 6,
        "2": 9
    }
}

，您可以稍后将其加载到python中。或者，此脚本将完全生成您想要的字符串：

with open("exact.json", "w+") as of:
    of.write('[\n\t{\n' + '\t},\n\t{\n'.join(["".join(["\t\t\"%s\": %s,\n"%(c, df[c][i]) for i in df.index]) for c in df.columns])+'\t}\n]')

，输出将是：

[
    {
        "col1": 1,
        "col1": 4,
        "col1": 7,
    },
    {
        "col2": 2,
        "col2": 5,
        "col2": 8,
    },
    {
        "col3": 3,
        "col3": 6,
        "col3": 9,
    }
]

编辑：固定括号

Answer 3

这种JSON有效期为not recommended, and strongly so，因为在反序列化期间，除了最后一个JSON数组元素外，您将丢失所有其他元素。

如何从Pandas数据框中创建JSON，其中列是关键

3 个答案: