Question

说我有一个类似这样的词典：

dictionary = {'A' : {'a': [1,2,3,4,5],
                     'b': [6,7,8,9,1]},

              'B' : {'a': [2,3,4,5,6],
                     'b': [7,8,9,1,2]}}

我希望数据框看起来像这样：

有没有方便的方法呢？如果我尝试：

In [99]:

DataFrame(dictionary)

Out[99]:
     A               B
a   [1, 2, 3, 4, 5] [2, 3, 4, 5, 6]
b   [6, 7, 8, 9, 1] [7, 8, 9, 1, 2]

我得到一个数据框，其中每个元素都是一个列表。我需要的是一个多索引，其中每个级别对应于嵌套字典中的键和对应于列表中每个元素的行，如上所示。我想我可以做一个非常粗糙的解决方案，但我希望可能会有一些更简单的东西。

Answer 1

Pandas希望MultiIndex值为元组，而不是嵌套的dicts。最简单的方法是在尝试将字典传递给DataFrame之前将字典转换为正确的格式：

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.iteritems() for innerKey, values in innerDict.iteritems()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
 ('A', 'b'): [6, 7, 8, 9, 1],
 ('B', 'a'): [2, 3, 4, 5, 6],
 ('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  2

[5 rows x 4 columns]

Answer 2

dict_of_df = {k: pd.DataFrame(v) for k,v in dictionary.items()}
df = pd.concat(dict_of_df, axis=1)

请注意，python＆lt;的列的顺序将丢失。 3.6

Answer 3

这个答案对游戏来说有点晚了，但是...

您正在寻找.stack中的功能：

df = pandas.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
# to break out the lists into columns
df = pd.DataFrame(df[0].values.tolist(), index=df.index)

Answer 4

如果字典中的列表长度不同，可以采用BrenBarn的方法。

>>> dictionary = {'A' : {'a': [1,2,3,4,5],
                         'b': [6,7,8,9,1]},
                 'B' : {'a': [2,3,4,5,6],
                        'b': [7,8,9,1]}}

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
 {('A', 'a'): [1, 2, 3, 4, 5],
  ('A', 'b'): [6, 7, 8, 9, 1],
  ('B', 'a'): [2, 3, 4, 5, 6],
  ('B', 'b'): [7, 8, 9, 1]}

>>> pandas.DataFrame.from_dict(reform, orient='index').transpose()
>>> df.columns = pd.MultiIndex.from_tuples(df.columns)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  NaN
[5 rows x 4 columns]

Answer 5

此递归函数应该起作用：

def reform_dict(dictionary, t=tuple(), reform={}):
    for key, val in dictionary.items():
        t = t + (key,)
        if isinstance(val, dict):
            reform_dict(val, t, reform)
        else:
            reform.update({t: val})
        t = t[:-1]
    return reform

嵌套字典到多索引数据框，其中字典键是列标签

5 个答案: