Question

因此，我每天都会在Pandas DataFrame中生成一组简单的计数。我希望能够为其添加时间戳，将其添加到从JSON文件加载的另一个DataFrame中，然后将其保存回JSON文件。我真正要努力的是找到正确的DataFrame结构和JSON格式来使之工作。当前，我的程序每天都以这种方式构建我的DataFrame。

   Condition  Count
0  EPN        20
1  LOA        35
2  EMS        15
3  PPM        7

我需要将其与从希望像这样的JSON文件中提取的DataFrame结合起来：

               EMS EPN LOA
1543867981.55  5   17  18

所以加入他们就像这样：

               EMS EPN LOA PPM
1543867981.55  5   17  18  NaN
1543932370.90  15  20  35  7

我正在尝试将其保存为这种JSON格式：

{"Time": "1543867981.55","Conditions":[{"EMS":5,"EPN":17,"LOA":18}],
 "Time": "1543932370.90","Conditions":[{"EMS":15,"EPN":20,"LOA":35,"PPM":7}]}

到目前为止，我还无法破解。

new_df = GetTodaysCount()
new_df.set_index('Condition')
new_df.columns=[str(time.time())]
new_df = new_df.transpose()  # I think I am now in my preferred format

#The closest I can get to loading in the dataframe from JSON file
with open("/filepath/sample.json") as f:
    d = json.load(f)
old_df = json_normalize(d['Conditions'])
#doesn't bring in timestamp as index, but if it did I would continue with:
final_df = pd.concat([new_df,old_df], sort=True)
final_df.to_json("/filepath/sample.json", orient='index')

但这会像这样存储json：

{"1543867981.55":{"EMS":5,"EPN":17,"LOA":18,"PPM":null},
 "1543932370.90":{"EMS":15,"EPN":20,"LOA":35,"PPM":7}}

要弄清我的唯一目标是：为每日DataFrame加上时间戳，将其与前几天的运行数据结合到一个DataFrame中（这样我就可以生成各种“条件”的图表）并存储新的合并数据。我选择JSON是因为我认为这是存储数据并可能找到其他用途的最干净的方法，但这可能是一个错误。

编辑： 我有一个截止日期，所以我继续前进，这与我需要的方式不完全相同。我的程序可以运行，但是我不得不放弃嵌套的json。如果有人有，我仍然对回答感兴趣。供参考，这是我目前正在做的事情：

new_df = GetTodaysCount()
new_df.set_index('Condition')
new_df.columns=[str(round(time.time(),0))]
new_df = new_df.transpose()
old_json = pd.read_json("/filepath/sample.json", orient='index')
final_df = pd.concat([new_df,old_df], sort=True)
final_df.to_json("/filepath/sample.json", orient='index')

Answer 1

好，关于您的问题，几件事：

json实际上不是存储数据的有效方法。看来您只有2个维度（时间和条件）。为什么不将Condition存储为行？在熊猫中，您可以根据需要旋转（stack/unstack）尺寸。根据我的经验，当无法确定架构或可能更改架构时，您主要只需要将数据存储为json。

如果您已经有一个从时间戳记开始收集每日计数的过程，则应将其附加到现有数据框中，如下所示：

import pandas as pd
import numpy as np
import string
from datetime import datetime as dt

letters = list(string.ascii_uppercase)
conditions = []
for cond in range(20):
    cond = ''.join(list(np.random.choice(letters,3)))
    conditions.append(cond)

conds = list(np.random.choice(conditions,np.random.randint(3,6)))
counts = list(np.random.randint(1,100,size=(len(conds))))
ts = (1544479493.979077-87400)
df = pd.DataFrame({'date':dt.fromtimestamp(ts).date(), 'timestamp':dt.fromtimestamp(ts), 'conditions':conds, 'counts':counts})
df.set_index('date', inplace=True)

conds = list(np.random.choice(conditions,np.random.randint(3,6)))
counts = list(np.random.randint(1,100,size=(len(conds))))
ts = (1544479493.979077)
df1 = pd.DataFrame({'date':dt.fromtimestamp(ts).date(), 'timestamp':dt.fromtimestamp(ts), 'conditions':conds, 'counts':counts})
df1.set_index('date',inplace=True)

df = df.append(df1)
print(df)

                            timestamp conditions  counts
date                                                    
2018-12-09 2018-12-09 13:48:13.979077        DWX      48
2018-12-09 2018-12-09 13:48:13.979077        TJC      95
2018-12-09 2018-12-09 13:48:13.979077        MFV       7
2018-12-10 2018-12-10 14:04:53.979077        AZQ      96
2018-12-10 2018-12-10 14:04:53.979077        BGX      23
2018-12-10 2018-12-10 14:04:53.979077        UFU      43
2018-12-10 2018-12-10 14:04:53.979077        WLT      85

既然您的数据采用这种格式，则可以根据需要轻松对其进行数据透视：

df.groupby(['date','conditions']).sum().unstack('conditions')

           counts                                   
conditions    AZQ   BGX   DWX  MFV   TJC   UFU   WLT
date                                                
2018-12-09    NaN   NaN  48.0  7.0  95.0   NaN   NaN
2018-12-10   96.0  23.0   NaN  NaN   NaN  43.0  85.0

使用Pandas加载，附加和存储嵌套的JSON

1 个答案: