我正在寻找一种简单的方法来创建一个表格,比较不同变量的方法(例如性别群体):
例如:
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['obs'] = range(1,1001)
df['gender'] = np.random.choice([1,2],1000)
df["salary"] = np.random.normal(15000, 5000, 1000)
df["Y_education"] = np.random.normal(8, 3, 1000).astype(int)
表格应该类似于(值不是计算出来的,只是为了说明而填充):
Variable Male(1) Female(2)
Salary_mean 15820 16852
Salary_sd 3620 2450
Years_of_Education_mean 9 8
Years_of_Education_sd 1.5 2
我知道.groupby
,但就我可以应用它而言,这并不会导致这样的表格。
答案 0 :(得分:3)
使用:
df_out = df.groupby('gender')[['salary', 'Y_education']].agg(['mean', 'std'])
df_out.columns = df_out.columns.map('_'.join)
print(df_out.T.rename(columns={1:'Male(1)', 2:'Female(2)'}))
输出:
gender Male(1) Female(2)
salary_mean 15187.741741 14919.403236
salary_std 4897.463288 5161.409774
Y_education_mean 7.607350 7.548654
Y_education_std 2.877772 3.102538