我有pandas df这个
cfg.CreateMap<IFoo, FooModel>().ConvertUsing<MyConverter>();
cfg.CreateMap<IPager<IFoo>, IPager<FooModel>>().ConvertUsing<MyConverter>();
我想添加另一个名为id Vote1 Vote2 Vote3
123 Positive Negative Positive
223 Positive Negative Neutral
323 Positive Negative Negative
423 Positive Positive
的列
这将被设置为大多数投票,如果有平局,则第一次投票将被设置,如id = 223所示
所以结果df应该是
winner
这可能与此有关 Update Pandas Cells based on Column Values and Other Columns
答案 0 :(得分:2)
您可以逐行执行此操作,如下所示:
import pandas as pd
import numpy as np
# Create the dataframe
df = pd.DataFrame()
df['id']=[123,223,323,423]
df['Vote1']=['Positive']*4
df['Vote2']=['Negative']*3+['Positive']
df['Vote3']=['Positive','Neutral','Negative','']
mostCommonVote=[]
for row in df[['Vote1','Vote2','Vote3']].values:
votes, values = np.unique(row, return_counts=True)
if np.all(values<=1):
mostCommonVote.append( row[0] )
else:
mostCommonVote.append( votes[np.argmax(values)] )
df['Winner'] = mostCommonVote
结果:
df:
id Vote1 Vote2 Vote3 Winner
0 123 Positive Negative Positive Positive
1 223 Positive Negative Neutral Positive
2 323 Positive Negative Negative Negative
3 423 Positive Positive Positive
它可能不是最优雅的解决方案,但它非常简单。它使用numpy函数 unique ,它可以返回行的每个唯一字符串的计数。
答案 1 :(得分:1)
另一个没有循环的Pandas解决方案:
df = df.set_index('id')
rep = {'Positive':1,'Negative':-1,'Neutral':0}
df1 = df.replace(rep)
df = df.assign(Winner=np.where(df1.sum(axis=1) > 0,'Positive',np.where(df1.sum(axis=1) < 0, 'Negative', df.iloc[:,0])))
print(df)
输出:
Vote1 Vote2 Vote3 Winner
id
123 Positive Negative Positive Positive
223 Positive Negative Neutral Positive
323 Positive Negative Negative Negative
423 Positive Positive NaN Positive
df.assign
是一种在原始数据框的副本中创建列的方法,因此您必须重新分配回df。该列的名称为Winner
,因此&#39;获胜者=&#39;。
接下来,您使用np.where
嵌套if语句... np.where(cond,result,else)
np.where(df.sum(axis=1) > 0, # this sums the dataframe by row
'Positive', #if true
np.where(df.sum(axis=1) < 0, #nested if the first if return false
'Negative', #sum of the row is less than 0
df.iloc[:,0] #sum = 0 get the first value from that row.
)
)
答案 2 :(得分:0)
我写了一个函数并将其应用于df。它通常比正常循环快一点。
import pandas as pd
import numpy as np
def vote(row):
pos = np.sum(row.values == 'Positive')
neg = np.sum(row.values == 'Negative')
if pos > neg:
return('Positive')
elif pos < neg:
return('Negative')
else:
return(row['Vote1'])
# Create the dataframe
df = pd.DataFrame()
df['id']=[123,223,323,423]
df['Vote1']=['Positive']*4
df['Vote2']=['Negative']*3+['Positive']
df['Vote3']=['Positive','Neutral','Negative','']
df = df.set_index('id')
df['Winner'] = df.apply(vote,axis=1)
结果
Out[41]:
Vote1 Vote2 Vote3 Winner
id
123 Positive Negative Positive Positive
223 Positive Negative Neutral Positive
323 Positive Negative Negative Negative
423 Positive Positive Positive