Question

我有一个这样的熊猫数据框

index | Creative Size       | Business Model
1     | Something trueview  |
2     | truviewhello        |
3     | dunno               |
4     | str                 |
5     | str                 |

我想编写一个代码，如果在列中存在“ trueview”，并将标签“ CPV”对应到业务模型中的相应行，则它将分配“ CPM”。预期输出为：

index | Creative Size       | Business Model
1     | Something trueview  | CPV
2     | truviewhello        | CPV
3     | dunno               | CPM
4     | str                 | CPM
5     | str                 | CPM

我想出了这个

count=0
for i in db_all['Creative Size']:
    if 'trueview' in i:
        db_all.loc[count, 'Business Model']='CPV'
        
    else:
        db_all.loc[count, 'Business Model']='CPM'
                
    count = count+1

它可以工作，但是很慢，有更好的主意吗？

Answer 1

将numpy.where与Series.str.contains一起使用：

db_all['Business Model'] = np.where(db_all['Creative Size'].str.contains('trueview'), 
                                    'CPV', 
                                    'CPM')

如果另一列的对应行包含某个子字符串，则在列中分配一个字符串，否则分配另一个字符串

1 个答案: