Question

我有一个带有一些二进制列（1，-1）的df和一个带有N个列名的列表。我需要创建一个像这样的新变量......

df ['test'] = np.where（（（df ['Col1'] == - 1）＆amp;（df ['Col2'] == - 1）），-1,0）

......但动态。所以规则是：如果列表中的所有列都具有相同的值（1，-1），则采用它。否则值= 0.列表的长度不固定。我可以简单地遍历列表并创建“where-String”或者是否有更优雅的方式？

谢谢！ ë

Answer 1

IIUC you can just do

df['test'] = np.where((df[list_of_col_names] == -1).all(axis=1), -1, 0)

So here you can just pass a list of cols of interest to sub-select from the orig df as all you're doing is comparing all cols of interest to a scalar value, you then do all(axis=1) to test if all row values match that value and pass the boolean mask to np.where as before.

e.g.:

list_of_col_names = ['col1','col2']
df['test'] = np.where((df[list_of_col_names] == -1).all(axis=1), -1, 0)

it's important you pass an actual list of names or iterable, if you do this it'll raise a KeyError:

df['test'] = np.where((df['col1','col2'] == -1).all(axis=1), -1, 0)

as it'll interpret this as a tuple and it's likely that this column 'col1','col2' doesn't exist

Python Pandas DF基于列列表创建新变量

1 个答案: