我正在尝试使用pandas在csv文件中实现简单的投票得分。基本上,如果`dataframe ['C'] == Active和dataframe ['Count'] == 0,那么dataframe ['Combo'] == 0.如果dataframe ['C'] == Active和dataframe ['计数'] == 1;然后dataframe ['Combo'] == 1.如果dataframe ['C'] == Active并且dataframe ['Count'] == 2;然后是dataframe ['Combo'] == 2,依此类推。
这是我的数据框:
A B C Count Combo
Ptn1 Lig1 Inactive 0
Ptn1 Lig1 Inactive 1
Ptn1 Lig1 Active 2 2
Ptn2 Lig2 Active 0 0
Ptn2 Lig2 Inactive 1
Ptn3 Lig3 Active 0 0
Ptn3 Lig3 Inactive 1
Ptn3 Lig3 Inactive 2
Ptn3 Lig3 Inactive 3
Ptn3 Lig3 Active 4 3
到目前为止,这是我的代码,为了清晰起见:
import pandas as pd
df = pd.read_csv('affinity.csv')
VOTE = 0
df['Combo'] = ''
df.loc[(df['Classification] == 'Active') & (df['Count'] == 0), 'Combo'] = VOTE
df.loc[(df['Classification] == 'Active') & (df['Count'] == 1), 'Combo'] = VOTE + 1
df.loc[(df['Classification] == 'Active') & (df['Count'] == 2), 'Combo'] = VOTE + 2
df.loc[(df['Classification] == 'Active') & (df['Count'] > 3), 'Combo'] = VOTE + 3
我的代码能够正确执行此操作。但是,Ptn3-Lig3对有两个“有效”值;一个在dataframe ['Count'] = 0,另一个在dataframe ['Count'] = 4。
有没有办法忽略第二个值(即只考虑最小的数据帧['Count']值)并将相应的数字添加到dataframe ['Combo']?
我知道pandas.DataFrame.drop_duplicates()
可能只是一种选择唯一值的方法,但是删除任何行都会很好。
答案 0 :(得分:1)
您可以执行groupby
+ apply
:
def foo(x):
m = x['C'].eq('Active')
if m.any():
return pd.Series(np.where(m, x.loc[m, 'Count'].head(1), np.nan))
else:
return pd.Series([np.nan] * len(x))
df['Combo'] = df.groupby(['A', 'B'], group_keys=False).apply(foo).values
print(df)
A B C Count Combo
0 Ptn1 Lig1 Inactive 0
1 Ptn1 Lig1 Inactive 1
2 Ptn1 Lig1 Active 2 2
3 Ptn2 Lig2 Active 0 0
4 Ptn2 Lig2 Inactive 1
5 Ptn3 Lig3 Active 0 0
6 Ptn3 Lig3 Inactive 1
7 Ptn3 Lig3 Inactive 2
8 Ptn3 Lig3 Inactive 3
9 Ptn3 Lig3 Active 4 0
groupby
+ merge
的另一种选择:
df = df.groupby(['A', 'B', 'C'])['C', 'Count']\
.apply(lambda x: x['Count'].values[0] if x['C'].eq('Active').any() else np.nan)\
.reset_index(name='Combo').fillna('').merge(df)
print(df)
A B C Combo Count
0 Ptn1 Lig1 Active 2 2
1 Ptn1 Lig1 Inactive 0
2 Ptn1 Lig1 Inactive 1
3 Ptn2 Lig2 Active 0 0
4 Ptn2 Lig2 Inactive 1
5 Ptn3 Lig3 Active 0 0
6 Ptn3 Lig3 Active 0 4
7 Ptn3 Lig3 Inactive 1
8 Ptn3 Lig3 Inactive 2
9 Ptn3 Lig3 Inactive 3
请注意,最终会对您的群组进行排序。