我正在尝试从查找数据框中填充job_industry_category
中的空值。例如:
df = pd.DataFrame()
df['job_title'] = ['Executive Secretary', 'Administrative Officer' , 'Recruiting Manager' , 'Senior Editor', 'Media Manager I']
df['job_industry_category'] = ['Health', 'Financial Services' , 'Property', NaN, NaN]
df
job_title job_industry_category
0 Executive Secretary Health
1 Administrative Officer Financial Services
2 Recruiting Manager Property
3 Senior Editor NaN
4 Media Manager I NaN
lookup = pd.DataFrame()
lookup['job_title'] = ['Executive Secretary', 'Senior Editor', 'Media Manager I']
lookup['job_industry_category'] = ['Retail', 'Manufacturing', 'Health']
lookup
job_title job_industry_category
0 Executive Secretary Health
1 Senior Editor Manufacturing
2 Media Manager I Health
我期望的结果将是:
df
job_title job_industry_category
0 Executive Secretary Health
1 Administrative Officer Financial Services
2 Recruiting Manager Property
3 Senior Editor Manufacturing
4 Media Manager I Health
我尝试使用map
,如下所示:
df.loc[df['job_industry_category'].isnull(), 'job_industry_category'] = lookup['job_title'].map(lookup)
并从另一篇文章中删除na:
def remove_na(x):
if pd.isnull(x['job_industry_category']):
return freq_job_ind[x['job_title']]
else:
return x['job_industry_category']
df['job_industry_category'] = df.apply(remove_na, axis=1)
但是两者都不起作用,而且我不确定是否有更好的方法来做到这一点? 预先谢谢你!
答案 0 :(得分:0)
#Boolean select NaN
m=df.job_industry_category.isna()
#Mask the NaNs and map across values using a dict of lookup['job_title']:lookup['job_industry_category'] df.loc[m,'job_industry_category']=df.loc[m,'job_title'].map(dict(zip(lookup.job_title,lookup.job_industry_category)))
job_title job_industry_category
0 Executive Secretary Health
1 Administrative Officer Financial Services
2 Recruiting Manager Property
3 Senior Editor Manufacturing
4 Media Manager I Health
答案 1 :(得分:0)
使用isna()获取丢失的位置,然后将map与set_index一起使用。
% ipython
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'job_title': ['Executive Secretary', 'Administrative Officer',
'Recruiting Manager', 'Senior Editor', 'Media Manager I'],
'job_industry_category': ['Health', 'Financial Services',
'Property', np.nan, np.nan]})
df
Out[1]:
job_title job_industry_category
0 Executive Secretary Health
1 Administrative Officer Financial Services
2 Recruiting Manager Property
3 Senior Editor NaN
4 Media Manager I NaN
In [2]:
lookup = pd.DataFrame({
'job_title': ['Executive Secretary', 'Senior Editor', 'Media Manager I'],
'job_industry_category': ['Retail', 'Manufacturing', 'Health']})
lookup
Out[2]:
job_title job_industry_category
0 Executive Secretary Retail
1 Senior Editor Manufacturing
2 Media Manager I Health
In [3]:
missing = df['job_industry_category'].isna()
In [4]:
df.loc[missing, 'job_industry_category'] = df.loc[missing, 'job_title'].map(
lookup.set_index('job_title')['job_industry_category'])
df
Out[4]:
job_title job_industry_category
0 Executive Secretary Health
1 Administrative Officer Financial Services
2 Recruiting Manager Property
3 Senior Editor Manufacturing
4 Media Manager I Health