我尝试过滤列,如果列包含某个字符串,我会将特定值附加到新列中。例如:
conditions = [df['columnA'].str.contains('valueA')]
choices = ['valueB']
df['columnB'] = np.select(conditions, choices, default = 'default')
但是当我运行它时,我收到以下错误:
ValueError: invalid entry in choicelist: should be boolean ndarray
我做错了什么?
答案 0 :(得分:3)
str.contains
中需要参数na=False
,因为NaN
中的boolean mask
评论为unutbu
:
conditions = [df['columnA'].str.contains('valueA', na=False)]
样品:
df = pd.DataFrame({'columnA':['valueA ff','ss valueA','valueA 4','w','e',np.nan]})
print (df)
columnA
0 valueA ff
1 ss valueA
2 valueA 4
3 w
4 e
5 NaN
print (df['columnA'].str.contains('valueA'))
0 True
1 True
2 True
3 False
4 False
5 NaN
Name: columnA, dtype: object
print (df['columnA'].str.contains('valueA', na=False))
0 True
1 True
2 True
3 False
4 False
5 False
Name: columnA, dtype: bool
所有在一起:
conditions = [df['columnA'].str.contains('valueA', na=False)]
choices = ['valueB']
df['columnB'] = np.select(conditions, choices, default = 'default')
print (df)
columnA columnB
0 valueA ff valueB
1 ss valueA valueB
2 valueA 4 valueB
3 w default
4 e default
5 NaN default