我正在研究Kaggle上的Titanic数据集:
https://www.kaggle.com/c/titanic/data
我正在尝试处理乘客姓名中包含的所有标题。
我可以使用“包含”方法进行过滤以显示值:
train[~train.Name.str.contains('Mr.|Mrs.|Miss.|Master.|Dr.|Rev.|Jonkheer.|Countess.|Major.|Col.|Capt.|Don.|Mme.|Mlle.')]['Name']
并显示我尚未捕获的内容:
443 Reynaldo, Ms. Encarnacion
Name: Name, dtype: object
所以我创建了一个映射器函数来创建另一个功能:
## title mapper function
def title_mapper(x):
if x.contains('Mr.'):
return 'Mr'
elif x.contains('Mrs.|Mme.'):
return 'Mrs'
elif x.contains('Miss.|Mlle.'):
return 'Miss'
elif x.contains('Dr.'):
return 'Dr'
elif x.contains('Rev.'):
return 'Cleric'
elif x.contains('Jonkheer.|Countess.|Don.|Ms.'):
return 'Noble'
elif x.contains('Major.|Col.|Capt.'):
return 'Military'
else:
return 'Other'
但是它声称没有属性包含:
train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-63-7c9804f87141> in <module>
20 return 'Other'
21
---> 22 train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
3589 else:
3590 values = self.astype(object).values
-> 3591 mapped = lib.map_infer(values, f, convert=convert_dtype)
3592
3593 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-63-7c9804f87141> in <lambda>(x)
20 return 'Other'
21
---> 22 train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
<ipython-input-63-7c9804f87141> in title_mapper(x)
3 ## title mapper function
4 def title_mapper(x):
----> 5 if x.contains('Mr.'):
6 return 'Mr'
7 elif x.contains('Mrs.|Mme.'):
AttributeError: 'str' object has no attribute 'contains'
查看了此问题和答案并进行了调整:
Does Python have a string 'contains' substring method?
但是据我了解,即使字符串前面有r'',也无法传递这样的多个模式。使用Python 3.7
'Capt.|Col.'
仅当对每个值进行硬编码时才起作用,但是有没有办法更好/更有效地做到这一点?
## title mapper function
def title_mapper(x):
if 'Mr.' in x:
return 'Mr'
elif 'Mrs.' in x:
return 'Mrs'
elif 'Mme.' in x:
return 'Mrs'
elif 'Miss.' in x:
return 'Miss'
elif 'Mlle.' in x:
return 'Miss'
elif 'Dr.' in x:
return 'Dr'
elif 'Rev.' in x:
return 'Cleric'
elif 'Jonkheer.' in x:
return 'Noble'
elif 'Countess.' in x:
return 'Noble'
elif 'Don.' in x:
return 'Noble'
elif 'Ms.' in x:
return 'Noble'
elif 'Major.' in x:
return 'Military'
elif 'Col.' in x:
return 'Military'
elif 'Capt.' in x:
return 'Military'
else:
return 'Other'
train['Title'] = train['Name'].apply(lambda x: title_mapper(x))
答案 0 :(得分:1)
如果性能很重要,请使用最后一个解决方案。也可以将其重映射为mapper的字典:
d = {'Mr':['Mr.'],
'Mrs':['Mrs.',' Mme.'],
'Miss':['Miss.','Mlle.'],
'Dr':['Dr.'],
'Cleric':['Rev.'],
'Noble':['Jonkheer.','Countess.','Don.','Ms.'],
'Military': ['Major.','Col.', 'Capt.']}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
def title_mapper1(x):
for k, v in d1.items():
if k in x:
return v
train['Title1'] = train['Name'].apply(title_mapper1).fillna('Other')