我正在使用一个标准数据框并创建摘要数据的各种子集数据框。这些子集都将被双索引,第一个索引相同。我被要求将所有摘要数据汇总在一起(他们想为所有摘要数据创建一个JSON)。我以为组合数据帧将是最简单的解决方案,但是我遇到了麻烦。
标准数据框示例:df
ID DEPT STATUS TYPE
0 100 5001 Active E
1 101 5001 Active M
2 101 5001 Active M
3 102 5005 Expired E
4 107 5001 Inactive M
5 110 5002 Inactive E
6 110 5002 Inactive E
然后我创建摘要数据并重命名该列:
status_df = pd.DataFrame(df.groupby(['DEPT','STATUS'])['ID'].nunique())
status_df.columns = ['Count_Status']
Count_Status
DEP STATUS
5001 Active 2
Inactive 1
5002 Inactive 1
5005 Expired 1
,然后在另一列:
type_df = pd.DataFrame(df.groupby(['DEPT','TYPE'])['ID'].nunique())
type_df.columns = ['Count_Type']
Count_Type
DEP TYPE
5001 E 1
M 2
5002 E 1
5005 E 1
我要创建的内容:
Count_Status Count_Type
DEP
STATUS TYPE
5001 Active 2 NaN
Inactive 1 NaN
E NaN 1
M NaN 2
5002 Inactive 1 NaN
E NaN 1
5005 Expried 1 NaN
E NaN 1
答案 0 :(得分:0)
您可以尝试使用pd.concat
和set_index
:
d1 = (df.groupby(['DEPT','STATUS'])['ID'].nunique()
.rename('Count Status')
.reset_index(level=1))
d2 = (df.groupby(['DEPT','TYPE'])['ID'].nunique()
.rename('Count Type')
.reset_index(level=1))
df_out = (pd.concat([d1, d2], sort=False)
.set_index(['STATUS','TYPE'], append=True)
.sort_index())
df_out
输出:
Count Status Count Type
DEPT STATUS TYPE
5001 Active NaN 2.0 NaN
Inactive NaN 1.0 NaN
NaN E NaN 1.0
M NaN 2.0
5002 Inactive NaN 1.0 NaN
NaN E NaN 1.0
5005 Expired NaN 1.0 NaN
NaN E NaN 1.0