我有一个如下所示的数据框和系列
user_response = pd.DataFrame({
'val_string': ['Correct','Mute','Test13','Test15','Unverified',np.nan,'>10 Edu'],
'num':[np.nan,np.nan,1201,1203,np.nan,np.nan,np.nan]
})
option_numbers = pd.DataFrame({
'answer':['Correct','Incorrect','mute','cannot see','paralysed','illiterate','tired','cannot hear','NIL',
'English','Malay','Mandarin','Hokkien','Teochew','Cantonese','Other - specify','Chinese',
'0 Edu','1-6 Edu','7-10 Edu','>10 Edu','Unreachable','Incomplete','Unverified','Complete'],
'option':[1,0,0,1,2,3,4,5,6,1,2,3,4,5,6,7,8,1,2,3,4,5,0,1,2]})
option_number = option_number.set_index('answer')['option']
尽管我能够根据下面的代码为匹配项成功映射,但是我丢失了non-matching
个项的现有值
user_response['num'] = user_response['val_string'].map(option_numbers)
如果运行我的代码,您会看到它丢失了Test13
,Test15
的值,因为它不存在于option_numbers series
中并且与Mute
不匹配由于大小写敏感问题,在mute
中使用
您能帮我弄清楚吗?
我希望我的输出如下所示
答案 0 :(得分:2)
首先,您需要数据框中的两列都大写或小写
user_response['val_string'] = user_response['val_string'].str.lower()
option_numbers['answer'] = option_numbers['answer'].str.lower()
然后只需使用fillna
填写缺失值,就必须将两个数据框中的索引都设置为正确的列,以使其起作用。
user_response = user_response.set_index('val_string')
option_numbers = option_numbers.set_index('answer')
user_response['num'] = user_response['num'].fillna(option_numbers['option'])
user_response
val_string
correct 1.0
mute 0.0
test13 1201.0
test15 1203.0
unverified 1.0
NaN NaN
>10 edu 4.0
Name: num, dtype: float64