Survived SibSp Parch
0 0 1 0
1 1 1 0
2 1 0 0
3 1 1 0
4 0 0 1
鉴于上述数据框架,groupby
有条件的优雅方式吗?
我想根据以下条件将数据分成两组:
(df['SibSp'] > 0) | (df['Parch'] > 0) = New Group -"Has Family"
(df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"
然后采用这两个组的方法,最终输出如下:
SurvivedMean
Has Family Mean
No Family Mean
是否可以使用groupby完成,还是必须使用上述条件语句追加新列?
谢谢!
答案 0 :(得分:9)
分组的简单方法是使用这两列的总和。如果它们中的任何一个是正数,则结果将大于1.并且groupby接受任意数组,只要该长度与DataFrame的长度相同,因此您不需要添加新列
family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out:
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
答案 1 :(得分:1)
如果SibSp
和Parch
列中的值永远不会超过0
,则只使用一个条件:
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
如果不可能,请先使用两个条件:
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family',
np.where(m2, 'No Family', 'Not'))
df = df.groupby(a)['Survived'].mean()
print (df)
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
答案 2 :(得分:1)
您可以在列表中定义条件,并使用下面的函数group_by_condition
为每个条件创建过滤列表。之后,您可以使用模式匹配选择结果项目:
df = [
{"Survived": 0, "SibSp": 1, "Parch": 0},
{"Survived": 1, "SibSp": 1, "Parch": 0},
{"Survived": 1, "SibSp": 0, "Parch": 0}]
conditions = [
lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0), # has family
lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0) # no family
]
def group_by_condition(l, conditions):
return [[item for item in l if condition(item)] for condition in conditions]
[has_family, no_family] = group_by_condition(df, conditions)