样本DF:
ID Match1 Match2 Match3 Match4 Match5
1 Yes No Yes Yes Yes
2 Yes No Yes Yes No
2 Yes No No Yes Yes
3 No Yes Yes Yes No
3 No Yes No No No
4 Yes No Yes No No
4 Yes No Yes Yes Yes
预期DF:
ID Match1 Match2 Match3 Match4 Match5 Final_Match
1 Yes No Yes Yes Yes Clear
2 Yes No Yes Yes No Unclear
2 Yes No No Yes Yes Unclear
3 No Yes Yes Yes No Clear
3 No Yes No No No Unclear
4 Yes No Yes No No Unclear
4 Yes No Yes Yes Yes Clear
问题陈述:
Clear
放在Final_Match
列中(示例ID 1)如果ID重复,则在Match1至Match5列的ID计数Yes
内,以较大的“是”为准,Clear
和Unclear
其他(示例ID 3和4
如果ID是重复的,则在Match1到Match5列的ID计数Yes
内,如果它们具有相等的“是”,则将Unclear
都放在.ToList().Take(Tags.Count(h => h.Facility == y.FacilityID))
中(示例ID 2)
我在ID内找不到任何解决方法吗?
答案 0 :(得分:2)
另一种方法是:
df['sum_yes']=df.iloc[:,1:6].eq('Yes').sum(axis=1)
df['final']=df.groupby('ID')['sum_yes'].transform\
(lambda x: np.where((x==x.max())&(~x.duplicated(keep=False)),'Clear','Unclear'))
print(df)
ID Match1 Match2 Match3 Match4 Match5 sum_yes final
0 1 Yes No Yes Yes Yes 4 Clear
1 2 Yes No Yes Yes No 3 Unclear
2 2 Yes No No Yes Yes 3 Unclear
3 3 No Yes Yes Yes No 3 Clear
4 3 No Yes No No No 1 Unclear
5 4 Yes No Yes No No 2 Unclear
6 4 Yes No Yes Yes Yes 4 Clear
PS 。如果需要,您可以删除sum_yes
列。
答案 1 :(得分:2)
您也可以使用Groupby.rank
来实现:
# Helper Series
s = (df.replace({'Yes': 1, 'No': 0})
.iloc[:, 1:]
.sum(1))
df['final_match'] = np.where(s.groupby(df['ID']).rank(ascending=False).eq(1), 'Clear', 'Unclear')
答案 2 :(得分:1)
使用pandas.DataFrame.groupby
:
final_match = []
for i, d in df.groupby('ID'):
if len(d) == 1:
final_match.append('Clear')
else:
counter = (d.filter(like='Match') == 'Yes').sum(1)
if counter.nunique() == 1:
final_match.extend(['Unclear'] * len(d))
else:
final_match.extend(counter.apply(lambda x: 'Clear' if x == max(counter) else 'Unclear').tolist())
df['final_match'] = final_match
print(df)
ID Match1 Match2 Match3 Match4 Match5 final_match
0 1 Yes No Yes Yes Yes Clear
1 2 Yes No Yes Yes No Unclear
2 2 Yes No No Yes Yes Unclear
3 3 No Yes Yes Yes No Clear
4 3 No Yes No No No Unclear
5 4 Yes No Yes No No Unclear
6 4 Yes No Yes Yes Yes Clear
说明:
len(d) == 1
:如果不重复,请添加Clear
counter = (d.filter(like='Match') == 'Yes').sum(1)
:计算每列中“是”的数量counter.nunique() == 1
:如果所有行都具有相同的“是”,则所有行都标记为“不清楚” counter.apply(lambda x: 'Clear' if x == max(counter) else 'Unclear').tolist()
:如果行的计数不同,则用“清除”标记最高,用“不清楚”标记其余