熊猫数据框列的适用条件

时间:2020-04-16 05:57:31

标签: python pandas numpy dataframe

例如: 我有一个带有2列的数据框,我想在条件上添加第4列。

1stcolumn  2ndcolumn 
abc.       aaa-eee
abc.       abc-def
abc.       ccc-eee
abc.       c-ee-f-g
abc.       a
abc.       eee-eee
abc.       bbb-ddd

条件是: 如果第三列的值类似于[any value in a_list]-[any value in b_list],则第四列的值应为controlled" else "not controlled

a_list = ['aaa','bbb','ccc'] 
b_list = ['ddd','eee']

预期输出:

1stcolumn  2ndcolumn.   3rdcolumn 
abc.       aaa-eee.     controlled
abc.       abc-def.     not controlled
abc.       ccc-eee.     controlled
abc.       c-ee-f-g.    not controlled
abc.       a.           not controlled
abc.       eee-eee.     not controlled
abc.       bbb-ddd.     controlled

1 个答案:

答案 0 :(得分:0)

通过itertools.product创建值的所有组合,通过Series.isin测试列,并通过numpy.where设置新值:

from  itertools import product

a_list = ['aaa','bbb','ccc'] 
b_list = ['ddd','eee']
new = [f'{a}-{b}' for a, b in product(a_list, b_list)]
print (new)
['aaa-ddd', 'aaa-eee', 'bbb-ddd', 'bbb-eee', 'ccc-ddd', 'ccc-eee']

df['3rdcolumn'] = np.where(df['2ndcolumn'].isin(new),'controlled','not controlled')
print (df)
  1stcolumn 2ndcolumn      3rdcolumn 
0      abc.   aaa-eee      controlled
1      abc.   abc-def  not controlled
2      abc.   ccc-eee      controlled
3      abc.  c-ee-f-g  not controlled
4      abc.         a  not controlled
5      abc.   eee-eee  not controlled
6      abc.   bbb-ddd      controlled