我有一个数据框,其中某些行被分类为“通过”或“失败”。我正在尝试根据项目的通过/失败次数对它们进行总体判断。
pandas ver 23.4
给出以下df:
*注意:还有其他几列,但为此目的,只有这两列很重要
Name Judgement
A Pass
A Fail
A Fail
A Pass
X Pass
X Pass
Z Pass
Z Pass
Z Fail
F Pass
为了做出整体判断,我们查看每个项目通过/失败的次数。出现两次以上的项目 仅在(通过次数==失败次数)时才被判定为“总体通过”。曾经发生的项目无需进一步判断。
Ex输出如下:
Name Judgement
A Pass
X Pass
Z Fail
F Pass
通知A
通过,因为它有2次通过和2次失败,所以2/2 = 1 == 通过
Z
失败,因为它有2个通过和1个失败,所以2/1 = 2 == 失败
我的想法:
在df['Name']
上进行分组,同时也加入Judgement
,并简单地计算每种名称对每种判断类型的出现次数。有没有更清洁的方法可以做到这一点?这个想法似乎有点麻烦,但我能提出的就是所有这些。
答案 0 :(得分:2)
这是您需要的吗? 0.5表示它们相等,1表示所有项目均合格,这两个条件产生合格
s=df.Judgement.eq('Pass').groupby(df['Name']).agg(['mean','count'])
((s['mean'].eq(1)&s['count'].le(2))|s['mean'].eq(0.5)).map({True:'Pass',False:'Fail'})
Out[436]:
Name
A Pass
F Pass
X Pass
Z Fail
dtype: object
答案 1 :(得分:2)
这是我的方法:
new_df = df.Judgement.eq('Pass').groupby(df['Name']).agg({'size','mean', 'max'})
is_passed = ( # check those with more than two counts
(new_df['mean'].eq(0.5) & new_df['size'].gt(2))
# those with one or two counts pass if they have a pass
| (new_df['size'].le(2) & new_df['max'])
)
产生:
Name
A True
F True
X True
Z False
dtype: bool
等效地,我们可以做到:
is_passed = np.where(new_df['size'].le(2), new_df['max'] , new_df['mean'].eq(0.5))
,您可以使用np.where
来掩盖pass
,fail
:
np.where(is_passed, 'pass', 'fail')
答案 2 :(得分:1)
具有自定义的apply
功能:
In [334]: def compare_pass_fail(x):
...: v_counts = x['Judgement'].value_counts()
...: return 'Pass' if ('Fail' not in v_counts or v_counts.get('Pass') == v_counts['Fail']) else 'Fail'
...:
In [335]: df.groupby('Name').apply(compare_pass_fail)
Out[335]:
Name
A Pass
F Pass
X Pass
Z Fail
dtype: object
答案 3 :(得分:1)
我使用了pandas groupby apply功能。逻辑可能会有所不同,但适用于您的情况。
df = pd.DataFrame({"Name": ["A","A","A","A","X","X","Z","Z","Z","F"], "Judgement" : ["Pass","Fail","Fail","Pass","Pass","Pass","Pass","Pass","Fail","Pass"]})
Name Judgement
0 A Pass
1 A Fail
2 A Fail
3 A Pass
4 X Pass
5 X Pass
6 Z Pass
7 Z Pass
8 Z Fail
9 F Pass
def func(x):
np = len(x[x["Judgement"] == "Pass"])
nf = len(x[x["Judgement"] == "Fail"])
if(np*nf == 0):
return x["Judgement"].unique()[0]
else:
if(np!=nf):
return "Fail"
else:
return "Pass"
df.groupby("Name").apply(func)
Name
A Pass
F Pass
X Pass
Z Fail
dtype: object
答案 4 :(得分:0)
您还可以首先通过失败计数生成DataFrame
并进行处理:
df_count= df.groupby(['Name', 'Judgement']).apply(len).unstack(-1).fillna(0)
然后处理它的列:
((df_count['Fail'] == df_count['Pass']) | ((df_count['Fail'] == 0) & (df_count['Pass'].le(2)))).map({True: 'Pass', False: 'Fail'})
总体结果是:
Name
A Pass
F Pass
X Pass
Z Fail
dtype: object
df_count可用于检查结果,看起来像这样:
Judgement Fail Pass
Name
A 2.0 2.0
F 0.0 1.0
X 0.0 2.0
Z 1.0 2.0