Question

我正在尝试：

也有几个国家/地区的名称中带有数字和/或括号。请确保将其删除，

例如

“古巴（加勒比海）”应该为“古巴”，

DataFrame in

    Country                         Energy    
18  Mexico                          321000000   
19  Cuba (Island of Caribeas)      102000000    
20  Algeria                        1959000000   
21  American                        2252661245  
22  Andorra(no mentioned)            9000000

我想得到这个df（DF出）

   Country                           Energy    
18  Mexico                          321000000   
19  Cuba                           102000000    
20  Algeria                        1959000000   
21  American                        2252661245  
22  Andorra                         9000000

我正在尝试

for item in df['Country']: #remove the () with the data inside
   re.sub(r" ?\(\w+\)", "", item)

但是我的DF没有任何变化，也没有错误，所以我不知道我在做什么错。请有人可以帮助我吗？

Answer 1

这可能是一个开始... 尝试：

df['Country'] = df['Country'].apply(lambda x: re.sub(r" ?\(\w+\)", "", x))

这会将表达式应用于df ['Country'] ...

中的每个值

Answer 2

正则表达式不太正确-如果方括号中有空格怎么办？

import pandas as pd

s = pd.Series(['Cuba (Island of Caribeas)', 'Andorra(no mentioned)', 'Algeria'])

s.replace(r" ?\((?:\w+ ?)+\)", "", regex=True)

这将返回：

Out[13]: 
0       Cuba
1    Andorra
2    Algeria
dtype: object

使其适合您的示例：

df['Country'] = df['Country'].replace(r" ?\((?:\w+ ?)+\)", "", regex=True)

删除大熊猫字符串中的括号区域

2 个答案: