如何在python中没有groupby()分隔女性和男性?

时间:2017-01-07 17:18:24

标签: python pandas numpy

这是一个数据框,每个都有Id,性别,年龄等等。 我首先将这个年龄与身份和性别分开。

    import numpy as np
    import pandas as pd
    age_distinct = titanic_df[['Sex','Age']].dropna()
    print age_distinct

得到这样的结果:

       Sex   Age
0      male  22.0
1    female  38.0
2    female  26.0
3    female  35.0
4      male  35.0
6      male  54.0
7      male   2.0
8    female  27.0
9    female  14.0
10   female   4.0
11   female  58.0
12     male  20.0
13     male  39.0
14   female  14.0
15   female  55.0
16     male   2.0
18   female  31.0
20     male  35.0
21     male  34.0
22   female  15.0
23     male  28.0
24   female   8.0
25   female  38.0
27     male  19.0
30     male  40.0
33     male  66.0
34     male  28.0
35     male  42.0
37     male  21.0
38   female  18.0
..      ...   ...
856  female  45.0
857    male  51.0

但我不知道下一步。 如何获得两组数据仅包括男性和女性

2 个答案:

答案 0 :(得分:0)

您正在寻找的是:

titanic_df[titanic_df['Sex'] == 'male']

如果您熟悉SQL,这基本上是SELECT * FROM titanic_df WHERE Sex == 'male'

编辑:如果要从pandas.DataFrame的每个级别创建两个不同的Sex对象,可以将每个DataFrame存储在字典中,如下所示:

distinct_dfs = {}
for level in set(titanic_df['Sex']):
     level_df = titanic_df[titanic_df['Sex'] == level]
     distinct_dfs[level] = level_df

这只是您可以采取的一种方法,对Sex的许多不同值有利。但是,由于您只有两个值,这将是最简单的:

female_df = titanic_df[titanic_df['Sex'] == 'female']
male_df = titanic_df[titanic_df['Sex'] == 'male']

答案 1 :(得分:0)

我认为您需要boolean indexingquery

print age_distinct[age_distinct.Sex == 'male']
print age_distinct.query('Sex == "male"') 

样品:

titanic_df = pd.DataFrame({'Sex':['male','female',np.nan],
                             'Age':[40,50,60]})

print (titanic_df)
   Age     Sex
0   40    male
1   50  female
2   60     NaN

age_distinct = titanic_df[['Sex','Age']].dropna()

print (age_distinct[age_distinct.Sex == 'male'])
    Sex  Age
0  male   40

print (age_distinct.query('Sex == "male"') )
    Sex  Age
0  male   40