使用Pandas加载,附加和存储嵌套的JSON

时间:2018-12-04 15:12:30

标签: python json pandas

因此,我每天都会在Pandas DataFrame中生成一组简单的计数。我希望能够为其添加时间戳,将其添加到从JSON文件加载的另一个DataFrame中,然后将其保存回JSON文件。我真正要努力的是找到正确的DataFrame结构和JSON格式来使之工作。当前,我的程序每天都以这种方式构建我的DataFrame。

   Condition  Count
0  EPN        20
1  LOA        35
2  EMS        15
3  PPM        7

我需要将其与从希望像这样的JSON文件中提取的DataFrame结合起来:

               EMS EPN LOA
1543867981.55  5   17  18

所以加入他们就像这样:

               EMS EPN LOA PPM
1543867981.55  5   17  18  NaN
1543932370.90  15  20  35  7

我正在尝试将其保存为这种JSON格式:

{"Time": "1543867981.55","Conditions":[{"EMS":5,"EPN":17,"LOA":18}],
 "Time": "1543932370.90","Conditions":[{"EMS":15,"EPN":20,"LOA":35,"PPM":7}]}

到目前为止,我还无法破解。

new_df = GetTodaysCount()
new_df.set_index('Condition')
new_df.columns=[str(time.time())]
new_df = new_df.transpose()  # I think I am now in my preferred format

#The closest I can get to loading in the dataframe from JSON file
with open("/filepath/sample.json") as f:
    d = json.load(f)
old_df = json_normalize(d['Conditions'])
#doesn't bring in timestamp as index, but if it did I would continue with:
final_df = pd.concat([new_df,old_df], sort=True)
final_df.to_json("/filepath/sample.json", orient='index')

但这会像这样存储json:

{"1543867981.55":{"EMS":5,"EPN":17,"LOA":18,"PPM":null},
 "1543932370.90":{"EMS":15,"EPN":20,"LOA":35,"PPM":7}}

要弄清我的唯一目标是:为每日DataFrame加上时间戳,将其与前几天的运行数据结合到一个DataFrame中(这样我就可以生成各种“条件”的图表)并存储新的合并数据。我选择JSON是因为我认为这是存储数据并可能找到其他用途的最干净的方法,但这可能是一个错误。

编辑: 我有一个截止日期,所以我继续前进,这与我需要的方式不完全相同。我的程序可以运行,但是我不得不放弃嵌套的json。如果有人有,我仍然对回答感兴趣。供参考,这是我目前正在做的事情:

new_df = GetTodaysCount()
new_df.set_index('Condition')
new_df.columns=[str(round(time.time(),0))]
new_df = new_df.transpose()
old_json = pd.read_json("/filepath/sample.json", orient='index')
final_df = pd.concat([new_df,old_df], sort=True)
final_df.to_json("/filepath/sample.json", orient='index')

1 个答案:

答案 0 :(得分:1)

好,关于您的问题,几件事:

json实际上不是存储数据的有效方法。看来您只有2个维度(时间和条件)。为什么不将Condition存储为行?在熊猫中,您可以根据需要旋转(stack/unstack)尺寸。根据我的经验,当无法确定架构或可能更改架构时,您主要只需要 将数据存储为json。

如果您已经有一个从时间戳记开始收集每日计数的过程,则应将其附加到现有数据框中,如下所示:

import pandas as pd
import numpy as np
import string
from datetime import datetime as dt

letters = list(string.ascii_uppercase)
conditions = []
for cond in range(20):
    cond = ''.join(list(np.random.choice(letters,3)))
    conditions.append(cond)

conds = list(np.random.choice(conditions,np.random.randint(3,6)))
counts = list(np.random.randint(1,100,size=(len(conds))))
ts = (1544479493.979077-87400)
df = pd.DataFrame({'date':dt.fromtimestamp(ts).date(), 'timestamp':dt.fromtimestamp(ts), 'conditions':conds, 'counts':counts})
df.set_index('date', inplace=True)

conds = list(np.random.choice(conditions,np.random.randint(3,6)))
counts = list(np.random.randint(1,100,size=(len(conds))))
ts = (1544479493.979077)
df1 = pd.DataFrame({'date':dt.fromtimestamp(ts).date(), 'timestamp':dt.fromtimestamp(ts), 'conditions':conds, 'counts':counts})
df1.set_index('date',inplace=True)

df = df.append(df1)
print(df)

                            timestamp conditions  counts
date                                                    
2018-12-09 2018-12-09 13:48:13.979077        DWX      48
2018-12-09 2018-12-09 13:48:13.979077        TJC      95
2018-12-09 2018-12-09 13:48:13.979077        MFV       7
2018-12-10 2018-12-10 14:04:53.979077        AZQ      96
2018-12-10 2018-12-10 14:04:53.979077        BGX      23
2018-12-10 2018-12-10 14:04:53.979077        UFU      43
2018-12-10 2018-12-10 14:04:53.979077        WLT      85

既然您的数据采用这种格式,则可以根据需要轻松对其进行数据透视:

df.groupby(['date','conditions']).sum().unstack('conditions')

           counts                                   
conditions    AZQ   BGX   DWX  MFV   TJC   UFU   WLT
date                                                
2018-12-09    NaN   NaN  48.0  7.0  95.0   NaN   NaN
2018-12-10   96.0  23.0   NaN  NaN   NaN  43.0  85.0