使用嵌套字典创建多索引的`DataFrame`

时间:2016-10-26 12:16:07

标签: python pandas dictionary nested series

此问题与this one有关。这次我想更进一步。给出一个字典:

dd = {0: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},

      1: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}}

或类似的列表:

ll = [{"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},

      {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
          "godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}]

我想构建一个DataFrame之类的:

                          russell                            godel                        cantor
                    score    ping                    score    ping                 score    ping
0     0.17473916938994682      40       0.3443303845926545      47   0.43576522521017247      42
1      0.7341005512329682      22      0.14682222267827938      81    0.5662517436162526      59

我们可以看到列索引是MultiIndex。有没有办法实现这一目标?如果我尝试pandas.DataFrame.from_dict(dd, orient="index")pandas.DataFrame(ll),我会得到:

                                      russell                                       godel                                      cantor
0  {'score': 0.17473916938994682, 'ping': 40}   {'score': 0.3443303845926545, 'ping': 47}  {'score': 0.43576522521017247, 'ping': 42}
1   {'score': 0.7341005512329682, 'ping': 22}  {'score': 0.14682222267827938, 'ping': 81}   {'score': 0.5662517436162526, 'ping': 59}

这不是我想要的。

2 个答案:

答案 0 :(得分:1)

现在它更复杂,Panel transposeto_frameunstack可以提供帮助:

df = pd.Panel(dd).transpose(2,0,1).to_frame().unstack()
print (df)
      cantor           godel           russell          
minor   ping     score  ping     score    ping     score
major                                                   
0       69.0  0.050641  51.0  0.765994    20.0  0.935196
1       91.0  0.398624  33.0  0.408681    75.0  0.464876

答案 1 :(得分:1)

这也行得通。请注意,您的嵌套字典并不是真正嵌套以便于翻译。

 pd.concat({key:pd.DataFrame(dd[key]) for key in dd.keys()}).unstack()
Out[104]: 
  cantor           godel           russell          
    ping     score  ping     score    ping     score
0   73.0  0.463084  94.0  0.954662    76.0  0.732291
1   28.0  0.778905  81.0  0.984285    36.0  0.094173

简而言之,使用concat创建多索引df非常简单。你只需要一个数据帧字典