Question

从嵌套字典中构建分组数据帧的有效方法是什么。
代码片段

# Endcoding Description
encoding_dict = {"Age":{"Middle": 0,
                        "Senior": 1,
                        "Young": 2},
                 "Sex":{"F": 0,
                        "M": 1},
                 "BP":{"High": 0,
                       "Low": 1,
                       "Normal": 2},
                 "Cholesterol":{"High": 0,
                                "Normal": 1}}

# Step 1 : Create DataFrame
df_1 = pd.DataFrame({"Features": ["Age"]*3 + ["Sex"]*2 + ["BP"]*3 + ["Cholesterol"]*2,
                     "Categories":["Middle", "Senior", "Young", "F", "M", "High", "Low","Normal", "High", "Normal"],
                     "Encoding":[0, 1, 2, 0, 1, 0, 1, 2, 0, 1]})

# Step 2 : Grouped DataFrame
grouped  = df_1.groupby(["Features","Categories"]).sum()
print(grouped)

输出

                           Encoding
Features    Categories          
Age         Middle             0
            Senior             1
            Young              2
BP          High               0
            Low                1
            Normal             2
Cholesterol High               0
            Normal             1
Sex         F                  0
            M                  1

在不手动执行步骤 (1) 的情况下创建所需的嵌套字典分组数据框的有效方法是什么？

Answer 1

构建框架构造函数然后添加轴名称的字典理解可以工作：

df = pd.DataFrame(
    {'encoding': {(k, sub_k): v
                  for k, sub_d in encoding_dict.items()
                  for sub_k, v in sub_d.items()}}
).rename_axis(index=['Features', 'Categories'])

df：

                        encoding
Features    Categories          
Age         Middle             0
            Senior             1
            Young              2
BP          High               0
            Low                1
            Normal             2
Cholesterol High               0
            Normal             1
Sex         F                  0
            M                  1

从嵌套字典创建分组的 DataFrame

1 个答案: