Pandas中的严格正则表达式替换

时间:2017-09-21 13:51:39

标签: regex python-2.7 pandas replace

我需要编写一个严格的regular expression来替换pandas数据框中的某些值。这是在解决我发布的here问题后提出的问题。

问题是.replace(idsToReplace, regex=True)并不严格。因此,如果iDsToReplace是:

NY : New York
NYC : New York City

我们正在替换ID的评论是:

My cat from NYC is large.

得到的答复是:

My cat from New York is large.

pandas replace函数中是否有pythonic方式使regular expression更严格地与NYC而非NY匹配?

1 个答案:

答案 0 :(得分:1)

word boundaries\b添加到dict的每个键:

d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
                    'The rock was found in LA.']
}

d = {r'\b' + k + r'\b':v for k, v in d.items()}

df = pd.DataFrame(data)

df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                         commentTest  
0                 The New York City tree is very big  
1  New York The cat from the United Kingdom is small  
2                 The rock was found in Los Angeles.