从数据框创建json结构

时间:2020-10-29 16:05:19

标签: json dataframe

我有两个Excel文件:

2020-01-consumption.xlsx
2020-01-production.xlsx
print(pd.read_excel(2020-01-production.xlsx, index_col = 0).head(4))


                         build_1   build_2               build_3       build_4  ...  
date                                                                            ...
2020-01-01 00:00:00          1.2       4.2                   4.3           7.0  ...
2020-01-01 01:00:00          3.3       1.9                   5.3           3.5  ...
2020-01-01 02:00:00          4.1       2.7                   6.0           2.6  ...
2020-01-01 03:00:00          3.6       6.0                   7.1           7.2  ...



print(pd.read_excel(2020-01-consumption.xlsx, index_col = 0).head(4))


                         build_1   build_2               build_3       build_4  ...  
date                                                                            ...
2020-01-01 00:00:00          0.4       1.0                   0.1           1.0  ...
2020-01-01 01:00:00          0.3       0.9                   0.0           0.4  ...
2020-01-01 02:00:00          0.3       0.5                   0.0           0.4  ...
2020-01-01 03:00:00          0.1       0.5                   0.4           0.4  ...

列和索引相同。我正在尝试建立一个for循环。因此,在这种情况下,我想将每一列另存为json文件。我想将数据结构更改为此:

with open(build_1.json, encoding="utf8") as f:  #The name of the new file to be created must be the column name.
    content = json.load(f)

print(content)

{'build_1': {  #The key is column name.
    'date': [2020-01-01 00:00:00, 2020-01-01 01:00:00, 2020-01-01 02:00:00, 2020-01-01 03:00:00 ...],  #index name as a key.
    'production': [1.2, 3.3, 4.1, 3.6 ...],  #excel name is changed as a key.
    'consumption': [0.4, 0.3, 0.3, 0.1 ...]}}  #excel name is changed as a key.

我有很多数据帧,例如生产和消费。我只想举两个例子。我如何实现这种结构?这可能吗?

1 个答案:

答案 0 :(得分:1)

您可以将concatMultiIndex in columns的keys参数一起使用:

df1 = pd.read_excel('2020-01-production.xlsx', index_col = 0)
df2 = pd.read_excel('2020-01-consumption.xlsx', index_col = 0)

df = pd.concat([df1, df2], keys=['production','consumption'], axis=1)
print (df)

                    production                         consumption          \
                       build_1 build_2 build_3 build_4     build_1 build_2   
date                                                                         
2020-01-01 00:00:00        1.2     4.2     4.3     7.0         0.4     1.0   
2020-01-01 01:00:00        3.3     1.9     5.3     3.5         0.3     0.9   
2020-01-01 02:00:00        4.1     2.7     6.0     2.6         0.3     0.5   
2020-01-01 03:00:00        3.6     6.0     7.1     7.2         0.1     0.5   

                                     
                    build_3 build_4  
date                                 
2020-01-01 00:00:00     0.1     1.0  
2020-01-01 01:00:00     0.0     0.4  
2020-01-01 02:00:00     0.0     0.4  
2020-01-01 03:00:00     0.4     0.4  

然后通过第二级循环,通过DataFrame.xs选择,如有必要,将日期时间转换为字符串或类似其他需要的方法,创建字典并最后写入文件:

for lvl in df.columns.levels[1]:
    print (lvl)
    df1 = df.xs(lvl, axis=1, level=1).reset_index()
    df1['date'] = df1['date'].astype(str)
    d ={lvl: df1.to_dict(orient='list')}
    #print (d)
    
    with open(f'{lvl}.json', mode='w', encoding="utf8") as f:
        json.dump(d, f)