我有一个带有一些二进制列(1,-1)的df和一个带有N个列名的列表。 我需要创建一个像这样的新变量......
df ['test'] = np.where(((df ['Col1'] == - 1)&(df ['Col2'] == - 1)),-1,0)
......但动态。所以规则是:如果列表中的所有列都具有相同的值(1,-1),则采用它。否则值= 0.列表的长度不固定。我可以简单地遍历列表并创建“where-String”或者是否有更优雅的方式?
谢谢! ë
答案 0 :(得分:1)
IIUC you can just do
df['test'] = np.where((df[list_of_col_names] == -1).all(axis=1), -1, 0)
So here you can just pass a list of cols of interest to sub-select from the orig df as all you're doing is comparing all cols of interest to a scalar value, you then do all(axis=1)
to test if all row values match that value and pass the boolean mask to np.where
as before.
e.g.:
list_of_col_names = ['col1','col2']
df['test'] = np.where((df[list_of_col_names] == -1).all(axis=1), -1, 0)
it's important you pass an actual list of names or iterable, if you do this it'll raise a KeyError
:
df['test'] = np.where((df['col1','col2'] == -1).all(axis=1), -1, 0)
as it'll interpret this as a tuple and it's likely that this column 'col1','col2'
doesn't exist