pandas数据帧过滤器计算

时间:2018-04-02 05:17:00

标签: python pandas dataframe

我有以下数据框

 # echo will also work

<div id="title"><?= print_r($title); ?></div>
<div id="content"><?= print_r($content); ?></div>
<div id="footer"><?= print_r($footer); ?></div>

我如何计算女性物理专业的录取率?

4 个答案:

答案 0 :(得分:1)

import numpy as np
np.average(dat['admitted'][(dat['gender']=='female') & (dat['major']=='Physics')].values)

工作原理:(dat['gender']=='female') & (dat['major']=='Physics')创建一个布尔pandas系列,可用于从dat['admitted']系列中选择正确的条目。 .values功能将这些条目提取为numpy数组。最后,我们采用这些条目的平均值给出了我们的准入率。

答案 1 :(得分:1)

我认为 -

df_f = df[(df['gender']=='female') & (df['major']=='Physics')]
df_f['admitted'].mean()

第一部分过滤器femalePhysics。接下来,我们计算mean

mean部分听起来不直观且很奇怪,但在数学上它会给出百分比值。 Python将boolean值视为01,所以基本上如果你总结并除以计数(mean确实),你实际上是在计算female的百分比1}} Physics名为admitted

的专业的学生

答案 2 :(得分:0)

import numpy as np
import pandas as pd
df = pd.DataFrame({"gender":np.random.choice(["male","female"],[20]),
                   "admitted":np.random.choice([True,False],[20]),
                   "major":np.random.choice(["Chemistry","Physics"],[20])})

phy_female_admited = df.loc[(df["major"]=="Physics") & (df["admitted"]==True) & ((df["gender"]=="female"))]
phy_female_applied = df.loc[(df["major"]=="Physics") & ((df["gender"]=="female"))]

acceptance_rate = phy_female_admited.shape[0]/phy_female_applied.shape[0]

更广泛的答案,但基本上与DZurico的

一样

忽略我在创建数据框并使用您自己的数据的行

答案 3 :(得分:0)

groupbyGroupBy.size的所有录取率以及sum的{​​{3}}的解决方案:

a = df.groupby(['gender' ,'admitted', 'major']).size()
print (a)
gender  admitted  major    
female  False     Chemistry    3
        True      Chemistry    1
                  Physics      1
male    False     Physics      1
        True      Physics      4
dtype: int64

b = a.groupby(['gender' ,'major']).transform('sum')
print (b)
gender  admitted  major    
female  False     Chemistry    4
        True      Chemistry    4
                  Physics      1
male    False     Physics      5
        True      Physics      5
dtype: int64

c = a.div(b)
print (c)
gender  admitted  major    
female  False     Chemistry    0.75
        True      Chemistry    0.25
                  Physics      1.00
male    False     Physics      0.20
        True      Physics      0.80
dtype: float64

按元组选择c需要哪一行:

print (c.loc[('female',True,'Physics')])
1.0

如果想要DataFrame中的所有值:

d = a.div(b).reset_index(name='rates')
print (d)
   gender  admitted      major  rates
0  female     False  Chemistry   0.75
1  female      True  Chemistry   0.25
2  female      True    Physics   1.00
3    male     False    Physics   0.20
4    male      True    Physics   0.80