我想动态展平父子层次结构熊猫数据框。
注释:
输入示例:
import pandas as pd
import numpy as np
pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.expand_frame_repr = False
pd.options.mode.chained_assignment = None
df = pd.DataFrame(
{
"child": ["xyz", "opr", "axz", "asd", "asd", "zxc", "zxc", "zxc"],
"parent": [np.nan, "xyz", "xyz", "opr", "opr", "opr", "axz", "xyz"],
}
)
print(df)
预期输出:
level_0 level_1 leaf
0 xyz opr asd
1 xyz opr asd
2 xyz opr zxc
3 xyz axz zxc
4 xyz NaN zxc
答案 0 :(得分:0)
叶子是child
列中不存在的parent
列元素。
完成此操作后,我将在每次通过时迭代添加一个新的父列,直到所有父母都为NaN。还需要另一个技巧来确保最后一级将包含最终的grand_parents:如果父列包含NaN,则必须使用前一列来切换值。代码:
result = df[~df['child'].isin(df['parent'])]
result.columns = ['leaf', 'lev_1']
ix = 1
while True:
result = result.merge(df, 'left', left_on=f'lev_{ix}', right_on='child'
).drop(columns='child')
if (result['parent'].isna().all()):
result = result.drop(columns='parent')
break
result.loc[result['parent'].isna(), f'lev_{ix}':'parent'
] = result[result['parent'].isna()][['parent', f'lev_{ix}']
].values
print(result)
ix += 1
result = result.rename(columns={'parent': f'lev_{ix}'})
# rename and reorder columns to match your expected result
result = result.rename(columns={f'lev_{ix-i}': f'lev_{i}' for i in range(ix)}
).reindex(columns=[f'lev_{i}' for i in range(ix)]
+ ['leaf'])
它给出了预期的结果:
lev_0 lev_1 leaf
0 grand parent parent 1 child 1
1 grand parent parent 1 child 1
2 grand parent parent 1 child 2
3 grand parent parent 2 child 2
4 grand parent NaN child 2
并且应该接受任意数量的级别