我有一个包含4列的堆叠数据框,每列都需要一个特定的字典。对于每一列,我希望各自的字典填写它可以包含的所有行。
这是数据框本身: https://i.imgur.com/DJ1xHnc.png
,对于LB列,字典LB_data
必须填充它可以的所有值:
[{'Essendon': 1.32, 'St Kilda': 3.3},
{'Carlton': 5.0, 'Port Adelaide': 1.16},
{'Geelong Cats': 1.57, 'Melbourne': 2.36},
{'Greater Western Sydney': 2.75, 'West Coast Eagles': 1.44},
{'Brisbane': 1.95, 'North Melbourne': 1.85},
{'Hawthorn': 1.38, 'Western Bulldogs': 3.0},
{'Fremantle': 1.32, 'Gold Coast': 3.3}]
我试图创建一个包含堆叠行的新字典,但是我不确定如何将其传递给DataFrame:
{'Essendon v St Kilda': {'Essendon': 1.32, 'St Kilda': 3.3}, 'Carlton v Port Adelaide': {'Port Adelaide': 1.16, 'Carlton': 5.0}, 'Geelong Cats v Melbourne': {'Geelong Cats': 1.57, 'Melbourne': 2.36}, 'Greater Western Sydney v West Coast Eagles': {'West Coast Eagles': 1.44, 'Greater Western Sydney': 2.75}, 'Brisbane v North Melbourne': {'North Melbourne': 1.85, 'Brisbane': 1.95}, 'Hawthorn v Western Bulldogs': {'Hawthorn': 1.38, 'Western Bulldogs': 3.0}, 'Fremantle v Gold Coast': {'Gold Coast': 3.3, 'Fremantle': 1.32}}
其他3列也有类似的字典。
如何创建字典以使数据可以轻松地替换为DataFrame?
谢谢!
答案 0 :(得分:0)
首先,您的数据应采用以下格式:
{column_1: {(l0_index_0, l1_index_0): value,
(l0_index_0, l1_index_1): value,
(l0_index_1, l1_index_0): value...
},
column_2: {l0_index_0...
}
...
}
例如:
data = {'LB': {('Brisbane v Port Adelaide', 'Brisbane'): 1, ('Brisbane v Port Adelaide', 'Port Adelaide'): 2,
('Fremantle v St Kilda', 'Fremantle'): 3, ('Fremantle v St Kilda', 'St Kilda'): 4},
'PB': {('Brisbane v Port Adelaide', 'Brisbane'): 5, ('Brisbane v Port Adelaide', 'Port Adelaide'): 6,
('Fremantle v St Kilda', 'Fremantle'): 7, ('Fremantle v St Kilda', 'St Kilda'): 8},
'SB': {('Brisbane v Port Adelaide', 'Brisbane'): 9, ('Brisbane v Port Adelaide', 'Port Adelaide'): 10,
('Fremantle v St Kilda', 'Fremantle'): 11, ('Fremantle v St Kilda', 'St Kilda'): 12},
'NEDS': {('Brisbane v Port Adelaide', 'Brisbane'): 13, ('Brisbane v Port Adelaide', 'Port Adelaide'): 14,
('Fremantle v St Kilda', 'Fremantle'): 15, ('Fremantle v St Kilda', 'St Kilda'): 16},
}
pd.DataFrame(data)
输出:
LB PB SB NEDS
Brisbane v Port Adelaide Brisbane 1 5 9 13
Port Adelaide 2 6 10 14
Fremantle v St Kilda Fremantle 3 7 11 15
St Kilda 4 8 12 16
因此,由于您的输入数据位于四个单独的数据框中,因此有必要以某种方式将它们连接起来以遵守这种格式。
使用问题中定义的LB_data
,我们可以定义一个简单的函数将其转换为所需格式的数据:
def transform(d):
keys = list(d)
combined = ' v '.join(keys)
return {(combined, key): value for key, value in d.items()}
pd.DataFrame({'LB': {k: v for datum in LB_data for k, v in transform(datum).items()}})
哪个给这个:
LB
Brisbane v North Melbourne Brisbane 1.95
North Melbourne 1.85
Carlton v Port Adelaide Carlton 5.00
Port Adelaide 1.16
Essendon v St Kilda Essendon 1.32
St Kilda 3.30
Fremantle v Gold Coast Fremantle 1.32
Gold Coast 3.30
Geelong Cats v Melbourne Geelong Cats 1.57
Melbourne 2.36
Greater Western Sydney v West Coast Eagles Greater Western Sydney 2.75
West Coast Eagles 1.44
Hawthorn v Western Bulldogs Hawthorn 1.38
Western Bulldogs 3.00
通过对其他输入数据集执行相同的操作并合并结果,您可以获得期望格式的数据框。