我试图概括一个将pandas数据框和指定列转换为特定层次格式的函数。我能够使用硬编码层次结构中的级别数来进行转换,但是,为了将其更广泛地使用,我一直坚持对它应用递归。
理想情况下,我希望能够指定以下内容:
_convert_df_to_hierarchy_dict(df, root_name, columns, attr_dict)
参数如下:
这里分别是3级转换和2级转换的工作代码:
3级
def _convert_df_to_hierarchical_dict_n3(df, root_name, level1, level2, level3, final_name, final_size):
data_dict = {"name":root_name,"children":[]}
for cat in df[level1].unique():
tmp3 = {"name": str(cat),"children":[]}
for cat2 in df[df[level1] == cat][level2].unique():
tmp2 = {"name": str(cat2),"children":[]}
for cat3 in df[(df[level1] == cat) & (df[level2] == cat2)][level3].unique():
tmp = {"name":str(cat3), "children":[]}
for _, row in df[(df[level1] == cat) & (df[level2] == cat2) & (df[level3] == cat3)][[final_name,final_size]].iterrows():
tmp["children"].append({"name":row[final_name], "size":row[final_size]})
tmp2["children"].append(tmp)
tmp3["children"].append(tmp2)
data_dict["children"].append(tmp3)
return(data_dict)
2级
def _convert_df_to_hierarchical_dict_n2(df, root_name, level1, level2, final_name, final_size):
data_dict = {"name":root_name,"children":[]}
for cat in df[level1].unique():
tmp3 = {"name": str(cat),"children":[]}
for cat2 in df[df[level1] == cat][level2].unique():
tmp2 = {"name": str(cat2),"children":[]}
for _, row in df[(df[level1] == cat) & (df[level2] == cat2)][[final_name,final_size]].iterrows():
tmp2["children"].append({"name":row[final_name], "size":row[final_size]})
tmp3["children"].append(tmp2)
data_dict["children"].append(tmp3)
return(data_dict)
当前变量的描述:
如果有人感兴趣,则转换是将数据放入表单以进行以下可视化:https://bl.ocks.org/mbostock/7607535