熊猫 - 选择列表中输入无效

时间:2017-11-13 12:44:47

标签: python pandas numpy

我尝试过滤列,如果列包含某个字符串,我会将特定值附加到新列中。例如:

conditions = [df['columnA'].str.contains('valueA')]
choices    = ['valueB']

df['columnB'] = np.select(conditions,  choices, default = 'default')

但是当我运行它时,我收到以下错误:

ValueError: invalid entry in choicelist: should be boolean ndarray

我做错了什么?

1 个答案:

答案 0 :(得分:3)

str.contains中需要参数na=False,因为NaN中的boolean mask评论为unutbu

conditions = [df['columnA'].str.contains('valueA', na=False)]

样品:

df = pd.DataFrame({'columnA':['valueA  ff','ss valueA','valueA 4','w','e',np.nan]})
print (df)
      columnA
0  valueA  ff
1   ss valueA
2    valueA 4
3           w
4           e
5         NaN
print (df['columnA'].str.contains('valueA'))
0     True
1     True
2     True
3    False
4    False
5      NaN
Name: columnA, dtype: object

print (df['columnA'].str.contains('valueA', na=False))
0     True
1     True
2     True
3    False
4    False
5    False
Name: columnA, dtype: bool

所有在一起:

conditions = [df['columnA'].str.contains('valueA', na=False)]
choices    = ['valueB']

df['columnB'] = np.select(conditions,  choices, default = 'default')
print (df)
      columnA  columnB
0  valueA  ff   valueB
1   ss valueA   valueB
2    valueA 4   valueB
3           w  default
4           e  default
5         NaN  default