Question

我正在尝试从用户名column中提取子字符串。但我没有得到我的实际结果。我的df如下图

data = {'Name':['inf.negem.netmgmt', 'infbe_cdb', 'inf_igh', 'INF_EONLOG','inf.dkprime.netmgmt','infaus_mgo','infau_abr']}
df = pd.DataFrame(data) 

print(df)

   Name
0    inf.negem.netmgmt
1            infbe_cdb
2              inf_igh
3           INF_EONLOG
4  inf.dkprime.netmgmt
5           infaus_mgo
6            infau_abr    

I tried following code.but i am not
df['Country'] = df['Name'].str.slice(3,6)

I would like to see output like below
output  = {'Country':['No_Country', 'be', 'No_Country', 'No_Country','No_Country','aus','au']}
df = pd.DataFrame(output) 

print(df)

  Country
0  No_Country
1          be
2  No_Country
3  No_Country
4  No_Country
5         aus
6          au

Note: I would like to extract words between 'inf' and '_' as country and would like to create new column as Country. if nothing is there after inf then it's value is 'No_Country'

Answer 1

这是使用str.extract的一种方法：

df['Country'] = (df.Name.str.lower()
                        .str.extract(r'inf(.*?)_')
                        .replace('', float('nan'))
                        .fillna('No_Country'))

print(df)

               Name     Country
0    inf.negem.netmgmt  No_Country
1            infbe_cdb          be
2              inf_igh  No_Country
3           INF_EONLOG  No_Country
4  inf.dkprime.netmgmt  No_Country
5           infaus_mgo         aus
6            infau_abr          au

Answer 2

使用列表理解和re.findall：

import re
df['Country'] = ["".join(re.findall(r'inf(.*?)_', i)) for i in df['Name']]


print(df)
                  Name    Country
0    inf.negem.netmgmt        
1            infbe_cdb       be
2              inf_igh        
3           INF_EONLOG        
4  inf.dkprime.netmgmt        
5           infaus_mgo       aus
6            infau_abr       au

从pandas列中提取子字符串

2 个答案: