例如: 我有一个带有2列的数据框,我想在条件上添加第4列。
1stcolumn 2ndcolumn
abc. aaa-eee
abc. abc-def
abc. ccc-eee
abc. c-ee-f-g
abc. a
abc. eee-eee
abc. bbb-ddd
条件是:
如果第三列的值类似于[any value in a_list]-[any value in b_list]
,则第四列的值应为controlled" else "not controlled
a_list = ['aaa','bbb','ccc']
b_list = ['ddd','eee']
预期输出:
1stcolumn 2ndcolumn. 3rdcolumn
abc. aaa-eee. controlled
abc. abc-def. not controlled
abc. ccc-eee. controlled
abc. c-ee-f-g. not controlled
abc. a. not controlled
abc. eee-eee. not controlled
abc. bbb-ddd. controlled
答案 0 :(得分:0)
通过itertools.product
创建值的所有组合,通过Series.isin
测试列,并通过numpy.where
设置新值:
from itertools import product
a_list = ['aaa','bbb','ccc']
b_list = ['ddd','eee']
new = [f'{a}-{b}' for a, b in product(a_list, b_list)]
print (new)
['aaa-ddd', 'aaa-eee', 'bbb-ddd', 'bbb-eee', 'ccc-ddd', 'ccc-eee']
df['3rdcolumn'] = np.where(df['2ndcolumn'].isin(new),'controlled','not controlled')
print (df)
1stcolumn 2ndcolumn 3rdcolumn
0 abc. aaa-eee controlled
1 abc. abc-def not controlled
2 abc. ccc-eee controlled
3 abc. c-ee-f-g not controlled
4 abc. a not controlled
5 abc. eee-eee not controlled
6 abc. bbb-ddd controlled