我在熊猫中有以下数据框
code job_descr job_type
123 sales executive nan
124 data scientist nan
145 marketing manager nan
132 finance nan
144 data analyst nan
我要将job_descr
划分为job_type
如下
sales : Sales
marketing : Marketing
finance : Finance
data science : Analytics
analyst : Analytics
我正在熊猫追随
def job_type_redifine(column_name):
if column_name.str.contains('sales'):
return 'Sales'
elif column_name.str.contains('marketing'):
return 'Marketing'
elif column_name.str.contains('data science|data scientist|analyst|machine learning'):
return 'Analytics'
else:
return 'Others'
final_df['job_type'] = final_df.apply(lambda row:
job_type_redifine(row['job_descr']), axis=1)
所需数据框
code job_descr job_type
123 sales executive Sales
124 data scientist Analytics
145 marketing manager Marketing
132 finance Finance
144 data analyst Analytics
答案 0 :(得分:1)
第一个解决方案是使用numpy.select
和Series.str.contains
,advatage正在处理缺少的值,但速度较慢:
Customer.create(attribute=value,attribute2=value2,..etc)
使用Series.apply
的解决方案-对于测试匹配值,请使用m1 = final_df['job_descr'].str.contains('sales')
m2 = final_df['job_descr'].str.contains('marketing')
m3 = final_df['job_descr'].str.contains('data science|data scientist|analyst|machine learning')
final_df['job_type'] = np.select([m1, m2, m3],
['Sales','Marketing','Analytics'], default='Others')
print (final_df)
code job_descr job_type
0 123 sales executive Sales
1 124 data scientist Analytics
2 145 marketing manager Marketing
3 132 finance Others
4 144 data analyst Analytics
,这里是每个值的循环,但是它更快,因为pandas文本功能很慢。失败是许多in
的最后一个复杂条件:
or
性能:
def job_type_redifine(column_name):
if 'sales' in column_name:
return 'Sales'
elif 'marketing' in column_name:
return 'Marketing'
elif ('data science' in column_name or 'data scientist' in column_name
or 'analyst' in column_name or 'machine learning' in column_name):
return 'Analytics'
else:
return 'Others'
final_df['job_type'] = final_df['job_descr'].apply(job_type_redifine)
print (final_df)
code job_descr job_type
0 123 sales executive Sales
1 124 data scientist Analytics
2 145 marketing manager Marketing
3 132 finance Others
4 144 data analyst Analytics