我正在尝试从用户名column中提取子字符串。但我没有得到我的实际结果。 我的df如下图
data = {'Name':['inf.negem.netmgmt', 'infbe_cdb', 'inf_igh', 'INF_EONLOG','inf.dkprime.netmgmt','infaus_mgo','infau_abr']}
df = pd.DataFrame(data)
print(df)
Name
0 inf.negem.netmgmt
1 infbe_cdb
2 inf_igh
3 INF_EONLOG
4 inf.dkprime.netmgmt
5 infaus_mgo
6 infau_abr
I tried following code.but i am not
df['Country'] = df['Name'].str.slice(3,6)
I would like to see output like below
output = {'Country':['No_Country', 'be', 'No_Country', 'No_Country','No_Country','aus','au']}
df = pd.DataFrame(output)
print(df)
Country
0 No_Country
1 be
2 No_Country
3 No_Country
4 No_Country
5 aus
6 au
Note: I would like to extract words between 'inf' and '_' as country and would like to create new column as Country. if nothing is there after inf then it's value is 'No_Country'
答案 0 :(得分:1)
这是使用str.extract
的一种方法:
df['Country'] = (df.Name.str.lower()
.str.extract(r'inf(.*?)_')
.replace('', float('nan'))
.fillna('No_Country'))
print(df)
Name Country
0 inf.negem.netmgmt No_Country
1 infbe_cdb be
2 inf_igh No_Country
3 INF_EONLOG No_Country
4 inf.dkprime.netmgmt No_Country
5 infaus_mgo aus
6 infau_abr au
答案 1 :(得分:0)
使用列表理解和re.findall
:
import re
df['Country'] = ["".join(re.findall(r'inf(.*?)_', i)) for i in df['Name']]
print(df)
Name Country
0 inf.negem.netmgmt
1 infbe_cdb be
2 inf_igh
3 INF_EONLOG
4 inf.dkprime.netmgmt
5 infaus_mgo aus
6 infau_abr au