我目前正在研究汽车排放数据集,该数据集用于清理/标准化汽车型号名称。数据集很大,但是这里是前10行:
cars_em_df = pd.DataFrame({'manufacturer_name_mapped': ['FIAT', 'FIAT','FIAT','FIAT','FIAT','BMW AG','BMW AG','BMW AG','BMW AG','BMW AG'],
'commercial_name':['124 gt multiair auto', '500l wagon pop star t-jet',
'doblo combi 1.4 95', 'panda 0.9t sge 85 natural power', 'punto 1.4 77 lpg', 'x4 xdrive20d se auto', '216d active tourer b37 f45','220d gran tourer b47 f46','x1 xdrive18d sport','320i xdrive m sport gt auto'],
'fuel_type_mapped':['Petrol', 'Petrol', 'Petrol', 'NG-Biomethane', 'LPG','Diesel','Diesel','Diesel','Diesel','Petrol'],
'file_year':[2018, 2018, 2018, 2018, 2018,2018, 2018, 2018, 2018, 2018], 'emissions': [153,158,165,86,114,131,166,200,151,149], 'commercial_name_cleaned':['124','500',None,'panda','punto','x4',None,None,'x1',None]})
右侧列'commercial_name_cleaned'是我第一次清理活动的结果,其中我将'commercial_name'列中的名称与标准列表匹配来自不同来源的名称。如您所见,它们是非常简单和简短的名称。每当我无法匹配模型名称时,我的函数就会返回“无”。
第二步,我现在要执行以下操作:如果为“ None”,则在相邻的“ commercial_name” 列中搜索特定的字符串,并将其替换为模型名称I指定。我尝试过:
def str_ops(commercial_name_cleaned,commercial_name):
if commercial_name_cleaned == None:
if '216' in commercial_name:
return '2-series'
elif '220' in commercial_name:
return '2-series'
elif '320' in commercial_name:
return '3-series'
然后我将此功能应用于数据框:
cars_em_df['commercial_name_cleaned'] = cars_em_df.apply(lambda x: str_ops(str(x.commercial_name_cleaned), str(x.commercial_name)), axis=1)
重要的是要注意,如果在'commercial_name'中找不到'320'或'220'等,该函数不应更改任何内容,而只是返回中已经存在的值>“ commercial_name_cleaned” 。但是,当我应用该函数时,整个'commercial_name_cleaned'列仅变为“无”值。因此,该功能一定存在问题。有谁知道如何解决这个问题?
非常感谢您的帮助,谢谢!
答案 0 :(得分:0)
您正在None
列中获得commercial_name_cleaned
值,因为您没有从函数str_ops
返回任何内容,当您未隐式返回任何内容{{1}时}类型返回。
替换:
None
使用方式:
def str_ops(commercial_name_cleaned,commercial_name):
if commercial_name_cleaned == None:
if '216' in commercial_name:
return '2-series'
elif '220' in commercial_name:
return '2-series'
elif '320' in commercial_name:
return '3-series'
输出:
def str_ops(commercial_name_cleaned,commercial_name):
if commercial_name_cleaned == 'None':
if '216' in commercial_name:
return '2-series'
elif '220' in commercial_name:
return '2-series'
elif '320' in commercial_name:
return '3-series'
else:
return commercial_name_cleaned