平均比较表

时间:2018-06-07 14:24:29

标签: python pandas

我正在寻找一种简单的方法来创建一个表格,比较不同变量的方法(例如性别群体):

例如:

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['obs'] = range(1,1001)
df['gender'] = np.random.choice([1,2],1000)
df["salary"] = np.random.normal(15000, 5000, 1000)
df["Y_education"] = np.random.normal(8, 3, 1000).astype(int)

表格应该类似于(值不是计算出来的,只是为了说明而填充):

Variable                 Male(1)   Female(2)
Salary_mean              15820     16852
Salary_sd                 3620      2450
Years_of_Education_mean    9         8
Years_of_Education_sd      1.5       2

我知道.groupby,但就我可以应用它而言,这并不会导致这样的表格。

1 个答案:

答案 0 :(得分:3)

使用:

df_out = df.groupby('gender')[['salary', 'Y_education']].agg(['mean', 'std'])
df_out.columns = df_out.columns.map('_'.join)
print(df_out.T.rename(columns={1:'Male(1)', 2:'Female(2)'}))

输出:

gender                 Male(1)     Female(2)
salary_mean       15187.741741  14919.403236
salary_std         4897.463288   5161.409774
Y_education_mean      7.607350      7.548654
Y_education_std       2.877772      3.102538