我有下面的json文件,我想根据更新后的数据重新创建。
{"AAL": {"year": [2012, 2013, 2014, 2015],
"eps": [-5.6, -11.25, 4.02, 11.39],
"revenue": [24855.0, 26743.0, 42650.0, 40990.0],
"op_revenue": [148.0, 1399.0, 4249.0, 6204.0]},
"AAP": {"year": [2012, 2013, 2014, 2015],
"eps": [5.29, 5.36, 6.75, 6.45],
"revenue": [6205.003000000001, 6493.814, 9843.860999999999, 9737.018],
"op_revenue": [657.315, 660.318, 851.71, 825.78]},
"AAPL": {"year": [2013, 2014, 2015, 2016],
"eps": [40.03, 6.49, 9.28, 8.35],
"revenue": [171000.0, 183000.0, 234000.0, 216000.0],
"op_revenue": [48999.0, 52503.0, 71230.0, 60024.0]}
...}
我的数据来自形状完全相同的三个表格( eps,收入,op_revenue )。下面是一个表的前几行(第一列标题是股票行情指示器,其余列是年份)。
ticker 2012 2013 2014 2015 2016 2017 2018
1 A 938000000 724000000 740000000 713000000 692000000 504000000 381000000
2 AAL 431000000 -1833000000 -1012000000 -752000000 -99000000 2499000000 2951000000
3 AAN 134624000 120666000 108005000 90656000 78813000 78233000 89137000
4 AAOI 390000 -131000 -46000 1873000 3060000 4283000 3523000
5 AAON 37359000 37547000 40229000 39473000 41391000 44158000 42735000
6 AAP 407546000 391758000 417694000 440311000 458658000 493825000 494211000
如何重新创建json文件?
答案 0 :(得分:1)
考虑将每个数据帧从宽到长分解(即,不使用 year 数据值作为元素),然后与pandas.concat()
串联,最后在< em> ticker + to_dict
:
groupby
输出 (重复所有三个数据集的OP发布数据)
df_dict = {'eps': eps, 'revenue': revenue, 'op_revenue': op_revenue}
# MELTING WIDE TO LONG
new_df_dict = {k:(pd.melt(v, id_vars = "ticker", var_name = "year", value_name = k)
.set_index(["ticker", "year"])
) for k,v in df_dict.items()}
# HORIZONTAL CONCATENATING
final_df = (pd.concat(new_df_dict, axis="columns")
.sort_index()
.reset_index()
)
final_df.columns = final_df.columns.get_level_values(0)
# TICKER GROUPBY DICTIONARY
final_dict = {i: g.drop(columns='ticker').to_dict(orient='list') \
for i,g in final_df.groupby('ticker')}
# OUTPUT TO JSON
with open('Output.json', 'w') as f:
f.write(json.dumps(final_dict, indent=3))