我需要编写一个严格的regular expression
来替换pandas
数据框中的某些值。这是在解决我发布的here问题后提出的问题。
问题是.replace(idsToReplace, regex=True)
并不严格。因此,如果iDsToReplace是:
NY : New York
NYC : New York City
我们正在替换ID的评论是:
My cat from NYC is large.
得到的答复是:
My cat from New York is large.
pandas
replace
函数中是否有pythonic方式使regular expression
更严格地与NYC
而非NY
匹配?
答案 0 :(得分:1)
将word boundaries
的\b
添加到dict
的每个键:
d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
'The rock was found in LA.']
}
d = {r'\b' + k + r'\b':v for k, v in d.items()}
df = pd.DataFrame(data)
df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
commentTest
0 The New York City tree is very big
1 New York The cat from the United Kingdom is small
2 The rock was found in Los Angeles.