我有一个如下所示的数据框
Sector Plot Usage Status
SE1 1 Garden Constructed
SE1 2 School Constructed
SE1 3 Garden Not_Constructed
SE1 4 School Constructed
SE1 5 Garden Not_Constructed
SE1 6 School Not_Constructed
SE2 1 School Constructed
SE2 2 School Constructed
SE2 3 Garden Constructed
SE2 4 School Constructed
SE2 5 School Not_Constructed
SE2 6 School Not_Constructed
从上面我想在下面的数据帧中准备
预期输出:
Sector N_of_Garden_Const N_of_School_Const N_of_Garden_Not_Const N_of_School_Not_Const
SE1 1 2 2 1
SE2 1 3 0 2
哪里 N_of_Garden_Const =建造的花园数量
N_of_School_Const =所建学校的数量
N_of_Garden_Not_Const =未建花园的数量
N_of_School_Not_Const =未建学校的数量
答案 0 :(得分:3)
将crosstab
与MultiIndex in columns
展平map
一起使用:
df = pd.crosstab(df['Sector'], [df['Status'], df['Usage']])
df.columns = df.columns.map('N_of_{0[1]}_{0[0]}_Const'.format)
df = df.reset_index()
print (df)
Sector N_of_Garden_Constructed_Const N_of_School_Constructed_Const \
0 SE1 1 2
1 SE2 1 3
N_of_Garden_Not_Constructed_Const N_of_School_Not_Constructed_Const
0 2 1
1 0 2
另一个带有DataFrame.pivot_table
并被f-string
展平的想法:
df = df.pivot_table(index='Sector',
columns=['Status','Usage'],
aggfunc='size',
fill_value=0)
df.columns = df.columns.map(lambda x: f'N_of_{x[1]}_{x[0]}_Const')
df = df.reset_index()
print (df)
Sector N_of_Garden_Constructed_Const N_of_School_Constructed_Const \
0 SE1 1 2
1 SE2 1 3
N_of_Garden_Not_Constructed_Const N_of_School_Not_Constructed_Const
0 2 1
1 0 2
答案 1 :(得分:2)
我希望使用groupby列为UNSTACKING的方法。检查以下代码:
dfGrouped = df.groupby(['Sector', 'Usage', 'Status'])['Plot'].count().unstack([-2, -1])
dfGrouped.columns = ['_'.join(col).strip() for col in dfGrouped.columns.values]
dfGrouped.fillna(0, inplace = True)
输出:
Garden Constructed Garden Not_Constructed School Constructed School Not_Constructed
Sector
SE1 1.0 2.0 2.0 1.0
SE2 1.0 0.0 3.0 2.0