用字典中的值填充堆叠的DataFrame

时间:2019-03-30 02:32:18

标签: python pandas dataframe dictionary

我有一个包含4列的堆叠数据框,每列都需要一个特定的字典。对于每一列,我希望各自的字典填写它可以包含的所有行。

这是数据框本身: https://i.imgur.com/DJ1xHnc.png

,对于LB列,字典LB_data必须填充它可以的所有值:

[{'Essendon': 1.32, 'St Kilda': 3.3},
 {'Carlton': 5.0, 'Port Adelaide': 1.16},
 {'Geelong Cats': 1.57, 'Melbourne': 2.36},
 {'Greater Western Sydney': 2.75, 'West Coast Eagles': 1.44},
 {'Brisbane': 1.95, 'North Melbourne': 1.85},
 {'Hawthorn': 1.38, 'Western Bulldogs': 3.0},
 {'Fremantle': 1.32, 'Gold Coast': 3.3}]

我试图创建一个包含堆叠行的新字典,但是我不确定如何将其传递给DataFrame:

{'Essendon v St Kilda': {'Essendon': 1.32, 'St Kilda': 3.3}, 'Carlton v Port Adelaide': {'Port Adelaide': 1.16, 'Carlton': 5.0}, 'Geelong Cats v Melbourne': {'Geelong Cats': 1.57, 'Melbourne': 2.36}, 'Greater Western Sydney v West Coast Eagles': {'West Coast Eagles': 1.44, 'Greater Western Sydney': 2.75}, 'Brisbane v North Melbourne': {'North Melbourne': 1.85, 'Brisbane': 1.95}, 'Hawthorn v Western Bulldogs': {'Hawthorn': 1.38, 'Western Bulldogs': 3.0}, 'Fremantle v Gold Coast': {'Gold Coast': 3.3, 'Fremantle': 1.32}}

其他3列也有类似的字典。

如何创建字典以使数据可以轻松地替换为DataFrame?

谢谢!

1 个答案:

答案 0 :(得分:0)

首先,您的数据应采用以下格式:

{column_1: {(l0_index_0, l1_index_0): value, 
            (l0_index_0, l1_index_1): value,
            (l0_index_1, l1_index_0): value...
            },
 column_2: {l0_index_0...
            }
 ...
 }

例如:

data = {'LB': {('Brisbane v Port Adelaide', 'Brisbane'): 1, ('Brisbane v Port Adelaide', 'Port Adelaide'): 2,
               ('Fremantle v St Kilda', 'Fremantle'): 3, ('Fremantle v St Kilda', 'St Kilda'): 4},
        'PB': {('Brisbane v Port Adelaide', 'Brisbane'): 5, ('Brisbane v Port Adelaide', 'Port Adelaide'): 6,
               ('Fremantle v St Kilda', 'Fremantle'): 7, ('Fremantle v St Kilda', 'St Kilda'): 8},
        'SB': {('Brisbane v Port Adelaide', 'Brisbane'): 9, ('Brisbane v Port Adelaide', 'Port Adelaide'): 10,
               ('Fremantle v St Kilda', 'Fremantle'): 11, ('Fremantle v St Kilda', 'St Kilda'): 12},
        'NEDS': {('Brisbane v Port Adelaide', 'Brisbane'): 13, ('Brisbane v Port Adelaide', 'Port Adelaide'): 14,
                 ('Fremantle v St Kilda', 'Fremantle'): 15, ('Fremantle v St Kilda', 'St Kilda'): 16},
        }

pd.DataFrame(data)

输出:

                                        LB  PB  SB  NEDS
Brisbane v Port Adelaide Brisbane        1   5   9    13
                         Port Adelaide   2   6  10    14
Fremantle v St Kilda     Fremantle       3   7  11    15
                         St Kilda        4   8  12    16

因此,由于您的输入数据位于四个单独的数据框中,因此有必要以某种方式将它们连接起来以遵守这种格式。

使用问题中定义的LB_data,我们可以定义一个简单的函数将其转换为所需格式的数据:

def transform(d):
    keys = list(d)
    combined = ' v '.join(keys)
    return {(combined, key): value for key, value in d.items()}

pd.DataFrame({'LB': {k: v for datum in LB_data for k, v in transform(datum).items()}})

哪个给这个:

                                                                     LB
Brisbane v North Melbourne                 Brisbane                1.95
                                           North Melbourne         1.85
Carlton v Port Adelaide                    Carlton                 5.00
                                           Port Adelaide           1.16
Essendon v St Kilda                        Essendon                1.32
                                           St Kilda                3.30
Fremantle v Gold Coast                     Fremantle               1.32
                                           Gold Coast              3.30
Geelong Cats v Melbourne                   Geelong Cats            1.57
                                           Melbourne               2.36
Greater Western Sydney v West Coast Eagles Greater Western Sydney  2.75
                                           West Coast Eagles       1.44
Hawthorn v Western Bulldogs                Hawthorn                1.38
                                           Western Bulldogs        3.00

通过对其他输入数据集执行相同的操作并合并结果,您可以获得期望格式的数据框。