我有一个数据框,其中包含两列winner和newcol2。两列均包含变量white black和draw。我想比较列中的每一行。例如,如果获胜者:白色,而newcol2:黑色,则返回1
winner newcol2
0 white black
1 black white
2 white white
3 draw white
4 black draw
conditions1 = [
(x['winner'] == 'white'),
(x['winner'] == 'draw'),
(x['winner'] == 'black')]
conditions2 = [
(x['newcol2'] == 'white'),
(x['newcol2'] == 'draw'),
(x['newcol2'] == 'black')]
x['result'] = np.select(conditions1, conditions2, default='null')
我试图用下面的代码解决我的问题,但是我对等于和不等于的变量的判断是正确的还是错误的
答案 0 :(得分:2)
据我了解,您想为DataFrame中两列的每个唯一组合分配一个值。
如果数据框中没有所有组合,则可以使用这种方法用代码创建字典,或使用itertools生成字典。
combs = set(zip(df['winner'], df['newcol2']))
codes = dict(zip(combs, range(len(combs))))
使用apply方法用编码值替换两列中的组合:
df['result'] = df.apply(lambda x: codes[x['winner'], x['newcol2']], axis=1)
答案 1 :(得分:0)
除了您提供的示例外,我不确定您的所有条件是什么,但这会起作用:
In [23]: conditions = []
In [24]: for row in df.itertuples():
...: if row.winner == 'white' and row.newcol2 == 'black':
...: conditions.append(1)
...: elif row.winner == 'black' and row.newcol2 == 'white':
...: conditions.append(1)
...: else:
...: conditions.append(0)
...:
In [25]: conditions
Out[25]: [1, 1, 0, 0, 0]
In [26]: df['conditions'] = conditions
In [27]: df
Out[27]:
winner newcol2 conditions
0 white black 1
1 black white 1
2 white white 0
3 draw white 0
4 black draw 0
您可以根据自己的条件修改代码