鉴于Pandas数据框具有分类列family
和数字列score
,我想得到family
以上(或低于)中位数{{1}的数量对于那个家庭。
概念上喜欢什么?
score
任何帮助?
答案 0 :(得分:2)
听起来你正在寻找像
这样的东西df[df.score > df.groupby('family').score.transform('median')].groupby('family').count()
答案 1 :(得分:2)
df.groupby('family')['Score'].apply(lambda x : pd.Series(x>np.median(x)).value_counts())
示例数据:
df = pd.DataFrame({'family': ['a','b','c']*3, 'B': ['d','e','f']*3, 'Score': [1,2,3,3,2,3,2,3,1]})
Out put:
Out[31]:
family
a False 2
True 1
b False 2
True 1
c False 3
Name: Score, dtype: int64
奖金:
df.groupby('family')['Score'].apply(lambda x : pd.Series(x>np.median(x)).value_counts()).\
unstack().rename(columns={True:'Above_med',False:'Below_med'})
Out[34]:
Below_med Above_med
family
a 2.0 1.0
b 2.0 1.0
c 3.0 NaN
答案 2 :(得分:1)
您可以尝试这样的事情:
df = pd.DataFrame({'family':['Family '+str(i) for i in np.random.choice(list('ABCD'),100)],'score':np.random.randint(40,100,100)})
above_avg = lambda x: (x>x.mean()).sum()
above_avg.__name__ = 'Above Average'
below_avg = lambda x: (x<=x.mean()).sum()
below_avg.__name__ = 'Below Average'
df.groupby('family')['score'].agg([above_avg, below_avg])
输出:
Above Average Below Average
family
Family A 9 12
Family B 11 15
Family C 12 12
Family D 15 14
答案 3 :(得分:1)
我使用lambda
来利用numpy
f = lambda x: (lambda v: np.count_nonzero(v > np.median(v)))(x.values)
df.groupby('family').Score.apply(f)