因此,我每天都会在Pandas DataFrame中生成一组简单的计数。我希望能够为其添加时间戳,将其添加到从JSON文件加载的另一个DataFrame中,然后将其保存回JSON文件。我真正要努力的是找到正确的DataFrame结构和JSON格式来使之工作。当前,我的程序每天都以这种方式构建我的DataFrame。
Condition Count
0 EPN 20
1 LOA 35
2 EMS 15
3 PPM 7
我需要将其与从希望像这样的JSON文件中提取的DataFrame结合起来:
EMS EPN LOA
1543867981.55 5 17 18
所以加入他们就像这样:
EMS EPN LOA PPM
1543867981.55 5 17 18 NaN
1543932370.90 15 20 35 7
我正在尝试将其保存为这种JSON格式:
{"Time": "1543867981.55","Conditions":[{"EMS":5,"EPN":17,"LOA":18}],
"Time": "1543932370.90","Conditions":[{"EMS":15,"EPN":20,"LOA":35,"PPM":7}]}
到目前为止,我还无法破解。
new_df = GetTodaysCount()
new_df.set_index('Condition')
new_df.columns=[str(time.time())]
new_df = new_df.transpose() # I think I am now in my preferred format
#The closest I can get to loading in the dataframe from JSON file
with open("/filepath/sample.json") as f:
d = json.load(f)
old_df = json_normalize(d['Conditions'])
#doesn't bring in timestamp as index, but if it did I would continue with:
final_df = pd.concat([new_df,old_df], sort=True)
final_df.to_json("/filepath/sample.json", orient='index')
但这会像这样存储json:
{"1543867981.55":{"EMS":5,"EPN":17,"LOA":18,"PPM":null},
"1543932370.90":{"EMS":15,"EPN":20,"LOA":35,"PPM":7}}
要弄清我的唯一目标是:为每日DataFrame加上时间戳,将其与前几天的运行数据结合到一个DataFrame中(这样我就可以生成各种“条件”的图表)并存储新的合并数据。我选择JSON是因为我认为这是存储数据并可能找到其他用途的最干净的方法,但这可能是一个错误。
编辑: 我有一个截止日期,所以我继续前进,这与我需要的方式不完全相同。我的程序可以运行,但是我不得不放弃嵌套的json。如果有人有,我仍然对回答感兴趣。供参考,这是我目前正在做的事情:
new_df = GetTodaysCount()
new_df.set_index('Condition')
new_df.columns=[str(round(time.time(),0))]
new_df = new_df.transpose()
old_json = pd.read_json("/filepath/sample.json", orient='index')
final_df = pd.concat([new_df,old_df], sort=True)
final_df.to_json("/filepath/sample.json", orient='index')
答案 0 :(得分:1)
好,关于您的问题,几件事:
json实际上不是存储数据的有效方法。看来您只有2个维度(时间和条件)。为什么不将Condition存储为行?在熊猫中,您可以根据需要旋转(stack/unstack
)尺寸。根据我的经验,当无法确定架构或可能更改架构时,您主要只需要 将数据存储为json。
如果您已经有一个从时间戳记开始收集每日计数的过程,则应将其附加到现有数据框中,如下所示:
import pandas as pd
import numpy as np
import string
from datetime import datetime as dt
letters = list(string.ascii_uppercase)
conditions = []
for cond in range(20):
cond = ''.join(list(np.random.choice(letters,3)))
conditions.append(cond)
conds = list(np.random.choice(conditions,np.random.randint(3,6)))
counts = list(np.random.randint(1,100,size=(len(conds))))
ts = (1544479493.979077-87400)
df = pd.DataFrame({'date':dt.fromtimestamp(ts).date(), 'timestamp':dt.fromtimestamp(ts), 'conditions':conds, 'counts':counts})
df.set_index('date', inplace=True)
conds = list(np.random.choice(conditions,np.random.randint(3,6)))
counts = list(np.random.randint(1,100,size=(len(conds))))
ts = (1544479493.979077)
df1 = pd.DataFrame({'date':dt.fromtimestamp(ts).date(), 'timestamp':dt.fromtimestamp(ts), 'conditions':conds, 'counts':counts})
df1.set_index('date',inplace=True)
df = df.append(df1)
print(df)
timestamp conditions counts
date
2018-12-09 2018-12-09 13:48:13.979077 DWX 48
2018-12-09 2018-12-09 13:48:13.979077 TJC 95
2018-12-09 2018-12-09 13:48:13.979077 MFV 7
2018-12-10 2018-12-10 14:04:53.979077 AZQ 96
2018-12-10 2018-12-10 14:04:53.979077 BGX 23
2018-12-10 2018-12-10 14:04:53.979077 UFU 43
2018-12-10 2018-12-10 14:04:53.979077 WLT 85
既然您的数据采用这种格式,则可以根据需要轻松对其进行数据透视:
df.groupby(['date','conditions']).sum().unstack('conditions')
counts
conditions AZQ BGX DWX MFV TJC UFU WLT
date
2018-12-09 NaN NaN 48.0 7.0 95.0 NaN NaN
2018-12-10 96.0 23.0 NaN NaN NaN 43.0 85.0