我正在尝试从pandas数据框创建多级嵌套字典-在下面的示例中,我想为每个邮政编码检索每种性别和年龄组合的薪水总和。
输出必须是Expected output
注释中显示的字典。
from typing import NamedTuple, Sequence, Tuple
import pandas as pd
data = [
["tom", 22, "ab 11", "M", 5555],
["Rob", 22, "ab 11", "M", 9999],
["nick", 33, "ab 22", "M", 3333],
["juli", 18, "ab 11", "F", 2222],
]
people = pd.DataFrame(data, columns=["Name", "Age", "PostalCode", "Sex", "Salary"])
d = (
people.groupby(["PostalCode", "Sex", "Age"])["Salary"]
.apply(sum)
.to_dict()
)
print(d)
# Expected output
print({"ab 11": {("M", 22): 15554, ("F", 18): 2222}, "ab 22": {("M", 33): 3333}})
答案 0 :(得分:2)
只需稍微改变您的解决方案并使用其他字典理解
df = (
people.groupby(["PostalCode", "Sex", "Age"])["Salary"]
.sum()
.unstack(0)
)
d = {col: df[col].dropna().to_dict() for col in df}
print(d)
Out[40]:
{'ab 11': {('F', 18): 2222.0, ('M', 22): 15554.0},
'ab 22': {('M', 33): 3333.0}}