这是一个数据框,每个都有Id,性别,年龄等等。 我首先将这个年龄与身份和性别分开。
import numpy as np
import pandas as pd
age_distinct = titanic_df[['Sex','Age']].dropna()
print age_distinct
得到这样的结果:
Sex Age
0 male 22.0
1 female 38.0
2 female 26.0
3 female 35.0
4 male 35.0
6 male 54.0
7 male 2.0
8 female 27.0
9 female 14.0
10 female 4.0
11 female 58.0
12 male 20.0
13 male 39.0
14 female 14.0
15 female 55.0
16 male 2.0
18 female 31.0
20 male 35.0
21 male 34.0
22 female 15.0
23 male 28.0
24 female 8.0
25 female 38.0
27 male 19.0
30 male 40.0
33 male 66.0
34 male 28.0
35 male 42.0
37 male 21.0
38 female 18.0
.. ... ...
856 female 45.0
857 male 51.0
但我不知道下一步。 如何获得两组数据仅包括男性和女性
答案 0 :(得分:0)
您正在寻找的是:
titanic_df[titanic_df['Sex'] == 'male']
如果您熟悉SQL,这基本上是SELECT * FROM titanic_df WHERE Sex == 'male'
。
编辑:如果要从pandas.DataFrame
的每个级别创建两个不同的Sex
对象,可以将每个DataFrame
存储在字典中,如下所示:
distinct_dfs = {}
for level in set(titanic_df['Sex']):
level_df = titanic_df[titanic_df['Sex'] == level]
distinct_dfs[level] = level_df
这只是您可以采取的一种方法,对Sex
的许多不同值有利。但是,由于您只有两个值,这将是最简单的:
female_df = titanic_df[titanic_df['Sex'] == 'female']
male_df = titanic_df[titanic_df['Sex'] == 'male']
答案 1 :(得分:0)
我认为您需要boolean indexing
或query
:
print age_distinct[age_distinct.Sex == 'male']
print age_distinct.query('Sex == "male"')
样品:
titanic_df = pd.DataFrame({'Sex':['male','female',np.nan],
'Age':[40,50,60]})
print (titanic_df)
Age Sex
0 40 male
1 50 female
2 60 NaN
age_distinct = titanic_df[['Sex','Age']].dropna()
print (age_distinct[age_distinct.Sex == 'male'])
Sex Age
0 male 40
print (age_distinct.query('Sex == "male"') )
Sex Age
0 male 40