将嵌套的 Json 转换为 CSV Python

时间:2021-02-14 18:24:08

标签: python json pandas csv

我正在尝试将复杂的 json(嵌套格式)转换为 csv

{
"caudal": [
{"ts": 1612746051248, "value": "0.0"}, 
{"ts": 1612745450856, "value": "0.0"}, 
{"ts": 1612744250898, "value": "0.0"}, 
{"ts": 1612743650861, "value": "0.0"}, 
{"ts": 1612743050821, "value": "0.0"} 
], 
"FreeHeap": [
{"ts": 1612746051248, "value": "247564"}, 
{"ts": 1612745450856, "value": "247564"}, 
{"ts": 1612744250898, "value": "247564"}, 
{"ts": 1612743650861, "value": "247564"}, 
{"ts": 1612743050821, "value": "247564"} 
], 
"MinimoFreeHeap": [
{"ts": 1612746051248, "value": "237440"}, 
{"ts": 1612745450856, "value": "237440"}, 
{"ts": 1612744250898, "value": "237440"}, 
{"ts": 1612743650861, "value": "237440"}, 
{"ts": 1612743050821, "value": "237440"} 
]
} 

我的程序必须处理的 jsons 包含更多记录,但为了简化分析,我将其缩小。我尝试使用 Pandas 库,如下所示:

import pandas as pd

with open('read.json') as f_input:
    df = pd.read_json(f_input)

df.to_csv('out.csv', encoding='utf-8', index=False)

我得到以下结果:

caudal,FreeHeap,MinimoFreeHeap
"{'ts': 1612746051248, 'value': '0.0'}","{'ts': 1612746051248, 'value': '247564'}","{'ts': 1612746051248, 'value': '237440'}"
"{'ts': 1612745450856, 'value': '0.0'}","{'ts': 1612745450856, 'value': '247564'}","{'ts': 1612745450856, 'value': '237440'}"
"{'ts': 1612744250898, 'value': '0.0'}","{'ts': 1612744250898, 'value': '247564'}","{'ts': 1612744250898, 'value': '237440'}"
"{'ts': 1612743650861, 'value': '0.0'}","{'ts': 1612743650861, 'value': '247564'}","{'ts': 1612743650861, 'value': '237440'}"
"{'ts': 1612743050821, 'value': '0.0'}","{'ts': 1612743050821, 'value': '247564'}","{'ts': 1612743050821, 'value': '237440'}"

如你所见,每个单元格的信息是例如:

"{'ts': 1612743050821, 'value': '247564'}"

我理解的是另一个Json..有没有什么简单的方法可以添加一个名为timestamp(ts)的列并且只将值放在这个json现在所在的单元格中? 我相信这将是正确的方法,我的目标是将 json 中包含的信息转换为 csv 格式,使其更容易被第三方(数据库或人工智能算法)使用。但是如果你能想到另一种更方便的方式或格式,我愿意改变我最初的想法。我不得不承认我是这个世界的新手。

我想通过 json 并手动进行转换,但很难关联具有相同时间戳的测量值。

3 个答案:

答案 0 :(得分:1)

尼古拉斯

您没有说明您想要数据的方式,因此下面发布的代码将其转换为表格格式,其中每一列用于机器(不确定是否正确)、ts 和值。

import pandas as pd
import json

with open('read.json') as f_input:
    data = json.load(f_input)

df = pd.DataFrame.from_dict(data, orient='columns')

df_new = pd.DataFrame(columns=['machine', 'ts', 'value'])
data=[]

for col in df.columns:
  for index,row in df[col].iteritems():
    ts, value = row.values()
    data.append({'machine':col, 'ts':ts, 'value':value})
    
df_new = df_new.append(data)

df_new.to_csv('out.csv', encoding='utf-8', index=False)

如果您希望列作为时间戳并且机器将最后两行更改为此

df_new = df_new.append(data).pivot(index='ts', columns='machine', values='value')

df_new.to_csv('out.csv', encoding='utf-8')

答案 1 :(得分:1)

  • 根据此 timing analysisquestion sl := TStringList.Create; try sl.LineBreak := '\n'; sl.Text := aString; FFirstRow := sl[0]; FSecondRow := sl[1]; finally sl.Free; end; 是从列中标准化单个级别 pd.DataFrame(df[col].values.tolist()) 的最快方法,但是此 answer 显示如何处理有问题的列(例如,在尝试 dict 时导致错误)。
.values.tolist()
  • 使用 import pandas as pd # read the json file with open('read.json') as f_input: df = pd.read_json(f_input) # create a new dataframe for the normalized columns from df normed_df = pd.DataFrame() # iterate through each column, normalize it, and append it to normed_df for col in df.columns: normed = pd.DataFrame(df[col].values.tolist()) # normalize the column from df normed['type'] = col # add the original column name as a new column so the associated values can be identified normed_df = normed_df.append(normed) # append to normed_df # convert ts to a datetime dtype normed_df.ts = pd.to_datetime(normed_df.ts, unit='ms') # reset the index normed_df = normed_df.reset_index(drop=True) # save this long form to a csv normed_df.to_csv('long.csv', index=False) # display(normed_df) ts value type 0 2021-02-08 01:00:51.248 0.0 caudal 1 2021-02-08 00:50:50.856 0.0 caudal 2 2021-02-08 00:30:50.898 0.0 caudal 3 2021-02-08 00:20:50.861 0.0 caudal 4 2021-02-08 00:10:50.821 0.0 caudal 5 2021-02-08 01:00:51.248 247564 FreeHeap 6 2021-02-08 00:50:50.856 247564 FreeHeap 7 2021-02-08 00:30:50.898 247564 FreeHeap 8 2021-02-08 00:20:50.861 247564 FreeHeap 9 2021-02-08 00:10:50.821 247564 FreeHeap 10 2021-02-08 01:00:51.248 237440 MinimoFreeHeap 11 2021-02-08 00:50:50.856 237440 MinimoFreeHeap 12 2021-02-08 00:30:50.898 237440 MinimoFreeHeap 13 2021-02-08 00:20:50.861 237440 MinimoFreeHeap 14 2021-02-08 00:10:50.821 237440 MinimoFreeHeap 将数据与 .pivot 对齐作为索引。
ts

答案 2 :(得分:0)

我终于找到了解决方案... 有一个非常有趣的库,名为“cherrypicker”。通过熊猫的示例和数据框,我想出了如何使其工作。代码如下:

import "react-native";
import React from "react";
import { shallow } from 'enzyme';
import { LoginContainer } from "...";
import { findByTestAttr } from '...';

const navigation = {
  navigate: jest.fn()
}

describe('correct login action', () => {
    const wrapper = shallow(<LoginContainer navigation={navigation} />);
    let input = findByTestAttr(wrapper, "login-input");
    let button = findByTestAttr(wrapper, "login-button");

    test('should not navigate to login mail screen if email adress is not entered', () => {
      input.simulate("changeText", "any@email.com");
      button.simulate("press");

      expect(navigation.navigate).toHaveBeenCalledTimes(1);

      //input.simulate("changeText", "");
      //button.simulate("press");
      //expect(navigation.navigate).toHaveBeenCalledTimes(0);
    });
});

我希望将来对某人有用,我不确定这是否是最简单的方法,但对我有用!问候