样本输入DF
Region Name
Europe Project-Europe
Unknown Project_Mexico
Unknown Project USA
Unknown Project
Paraguay Project
预期DF
Region Name New_Region
Europe Project_Europe Europe
Unknown Project_Mexico Mexico
Unknown Project-USA USA
Unknown Project Unknown
Paraguay Project Paraguay
样品列表
country_list= ['USA','MEXICO','Europe']
代码: (部分工作)
pattern = '|'.join(country_list).lower()
df['New_Region'] = ariba_df['Name'].str.lower().str.contains(pattern)
问题陈述
New_Region
,但是给出了True
或False
,我需要匹配预期输出中显示的值。Region
列时才可以进行上述匹配答案 0 :(得分:3)
将Series.str.extract
与re.I
一起使用,以忽略fillna
的情况:
仅通过布尔掩码为设置值最后添加numpy.where
:
import re
country_list= ['USA','MEXICO','Europe']
pattern = '|'.join(country_list)
mask = df['Region'] == 'Unknown'
s = (df['Name'].str.extract('(' + pattern + ')', flags=re.I, expand=False)
.fillna('Unknown'))
df['New_Region'] = np.where(mask, s, df['Region'])
print (df)
Region Name New_Region
0 Europe Project-Europe Europe
1 Unknown Project_Mexico Mexico
2 Unknown Project USA USA
3 Unknown Project Unknown
4 Paraguay Project Paraguay