从熊猫列获取列表元素的匹配值

时间:2019-02-25 09:32:27

标签: python pandas

样本输入DF

Region     Name
Europe     Project-Europe
Unknown    Project_Mexico
Unknown    Project USA
Unknown    Project
Paraguay   Project

预期DF

Region     Name                   New_Region
Europe     Project_Europe         Europe
Unknown    Project_Mexico         Mexico
Unknown    Project-USA            USA
Unknown    Project                Unknown
Paraguay   Project                Paraguay

样品列表

country_list= ['USA','MEXICO','Europe']

代码: (部分工作)

pattern = '|'.join(country_list).lower()
df['New_Region'] = ariba_df['Name'].str.lower().str.contains(pattern)

问题陈述

  1. 上面的代码创建了一个新列New_Region,但是给出了TrueFalse,我需要匹配预期输出中显示的值。
  2. 仅当“未知”中的Region列时才可以进行上述匹配

1 个答案:

答案 0 :(得分:3)

Series.str.extractre.I一起使用,以忽略fillna的情况:

仅通过布尔掩码为设置值最后添加numpy.where

import re

country_list= ['USA','MEXICO','Europe']

pattern = '|'.join(country_list)
mask = df['Region'] == 'Unknown'

s = (df['Name'].str.extract('(' + pattern + ')', flags=re.I, expand=False)
                   .fillna('Unknown'))

df['New_Region'] = np.where(mask, s, df['Region'])
print (df)

     Region            Name New_Region
0    Europe  Project-Europe     Europe
1   Unknown  Project_Mexico     Mexico
2   Unknown     Project USA        USA
3   Unknown         Project    Unknown
4  Paraguay         Project   Paraguay