如何计算每个标题中的男性/女性人数?

时间:2019-07-12 12:17:24

标签: python pandas dataframe pandas-groupby

我是数据科学的新手,我想计算每个标题中的男女人数。

我尝试了以下代码:

'''

newdf = pd.DataFrame()
newdf[ 'Title' ] = full[ 'Name' ].map( lambda name: name.split( ',' ) 
[1].split( '.' )[0].strip() )
newdf['Age'] = full['Age']
newdf['Sex'] = full['Sex']
newdf.dropna(axis = 0,inplace=True)
print(newdf.head())

我得到的是:

  Title   Age     Sex
0    Mr  22.0    male
1   Mrs  38.0  female
2  Miss  26.0  female
3   Mrs  35.0  female
4    Mr  35.0    male

然后我正在尝试添加#male,#female列

df = pd.DataFrame()
df = newdf[['Age','Title']].groupby('Title').mean().sort_values(by='Age',ascending=False)
df['#People'] = newdf['Title'].value_counts()
df['Male'] = newdf['Title'].sum(newdf['Sex']=='male')
df['Female'] = newdf['Title'].sum(newdf['Sex']=='female')

我收到的错误消息: TypeError:“系列”对象是可变的,因此无法进行散列

我希望有四列:标题,年龄(平均),#人,#male,#female。所以我想知道这些#people中有多少男性和女性

P.S没有这些行:

df['Male'] = newdf['Title'].sum(newdf['Sex']=='male')
df['Female'] = newdf['Title'].sum(newdf['Sex']=='female')

一切正常,我得到:

    Age #People
Title       
Capt    70.000000   1
Col     54.000000   4
Sir     49.000000   1
Major   48.500000   2
Lady    48.000000   1
Dr      43.571429   7
....

但没有#male,#female。

1 个答案:

答案 0 :(得分:1)

GroupBy.aggmean一起用于汇总size,对于新列,请在crosstab的前面加上DataFrame.join

df1 = (df.groupby('Title')['Age']
         .agg([('Age','mean'),('#People','size')])
         .sort_values(by='Age',ascending=False))

df2 = pd.crosstab(df['Title'], df['Sex']).add_suffix('_avg')

df = df1.join(df2)
print (df)
        Age  #People  female_avg  male_avg
Title                                     
Mrs    36.5        2           2         0
Mr     28.5        2           0         2
Miss   26.0        1           1         0