双嵌套字典到堆叠的DataFrame

时间:2019-11-13 15:55:50

标签: python pandas

设置

具有以下结构形式的字典:

subnetwork_dct = {518418568: {2: (478793912, 518418568, 518758448),
             3: (478793912, 518418568, 518758448, 1037590624),
             4: (478793912, 518418568, 518758448, 1037590624)},
 552214776: {2: (431042800, 552214776),
             3: (431042800,)},
 993280096: {2: (456917000, 993280096),
             3: (456917000, 993280096),
             4: (456917000, 993280096)}}

预期产量

遵循以下架构的Pandas DataFrame:

0             1     2
518418568     2     478793912
518418568     2     518418568
518418568     2     518758448
518418568     3     478793912
518418568     3     518418568
518418568     3     518758448
518418568     3     1037590624
518418568     4     478793912
518418568     4     518418568
518418568     4     518758448
518418568     4     1037590624
552214776     2     431042800
552214776     2     552214776
552214776     3     431042800
...

工作解决方案:

我目前的方法可行,但是我想知道是否有更清洁的解决方案?

import pandas as pd

multi_index_dct = {(k1, k2):v2 for k1,v1 in subnetwork_dct.items() \
                               for k2,v2 in subnetwork_dct[k1].items()}

df = pd.DataFrame([multi_index_dct[i] for i in sorted(multi_index_dct)],
                  index=pd.MultiIndex.from_tuples([i for i in sorted(multi_index_dct.keys())]))    

df_stacked = pd.DataFrame(df.stack()).reset_index()
df_stacked.drop('level_2', axis=1, inplace=True)
df_stacked.columns = [0,1,2]

df_stacked

3 个答案:

答案 0 :(得分:8)

0.25 explode

之后尝试pandas
pd.DataFrame(subnetwork_dct).stack().explode().reset_index()

答案 1 :(得分:4)

理解力

pd.DataFrame([
    (k0, k1, v) for k0, d in subnetwork_dct.items()
                for k1, V in d.items()
                for v     in V
])

            0  1           2
0   518418568  2   478793912
1   518418568  2   518418568
2   518418568  2   518758448
3   518418568  3   478793912
4   518418568  3   518418568
5   518418568  3   518758448
6   518418568  3  1037590624
7   518418568  4   478793912
8   518418568  4   518418568
9   518418568  4   518758448
10  518418568  4  1037590624
11  552214776  2   431042800
12  552214776  2   552214776
13  552214776  3   431042800
14  993280096  2   456917000
15  993280096  2   993280096
16  993280096  3   456917000
17  993280096  3   993280096
18  993280096  4   456917000
19  993280096  4   993280096

答案 2 :(得分:2)

不确定是否漂亮,但是很简洁...一种。

df_stacked = (pd.DataFrame(subnetwork_dct).T
                      .stack()
                      .apply(pd.Series)
                      .stack()
                      .reset_index(-1, drop=True)
                      .reset_index())
df_stacked.columns = [0,1,2]


Out[76]: 
           0  1            2
0  518418568  2  478793912.0
1  518418568  2  518418568.0
2  518418568  2  518758448.0
3  518418568  3  478793912.0
4  518418568  3  518418568.0