我是编码新手,我将发布2个数据框,如下所示:
raw_data:
country_code homicides_per_100k
ABC 2.6
ABB nan
ACC nan
homi_set:
Country Code year
ABC 2.6
ACC 11
ABB 3.1
ADD 0.5
2个数据框的顺序和形状不同。
如何使用homi_set中的数据替换raw_data中的nan?
我的代码如下所示。它不起作用:
for row, homicide in enumerate(raw_data['homicides_per_100k']):
if homicide == "":
country_code = raw_data.loc[row, 'country_code']
homi_set_index = homi_set.index[homi_set['Country Code'] == country_code]
homi_value = homi_set.loc[homi_set_index, '2014']
raw_data.loc[row, 'homicides_per_100k'] = homi_value
答案 0 :(得分:0)
set_index
+ combine_first
。设置索引使它可以基于country_code
更新值。如果raw_data
中的值不同,首先合并将优先考虑homi_set
中的非空值。
raw_data = raw_data.set_index('country_code')
raw_data.combine_first(homi_set.set_index('Country Code')
.rename(columns={'year': 'homicides_per_100k'}))
print(raw_data)
homicides_per_100k
country_code
ABC 2.6
ABB 3.1
ACC 11.0
答案 1 :(得分:0)
import pandas as pd
import numpy as np
# Just Creating your dataframes
raw_data = pd.DataFrame([('ABC', 2.6), ('ABB', np.nan), ('ACC', np.nan)], columns=['Country_code', 'homicides_per_100k'] )
homi_set = pd.DataFrame([('ABC', 2.6), ('ACC', 11), ('ABB', 3.1), ('ADD', 0.5)], columns=['Country_code', 'year'] )
# Left Join
new_set = pd.merge(raw_data, homi_set, on='Country_code', how='left')
# condition on the column
new_set['homicides_per_100k'] = np.where(new_set['homicides_per_100k'].isnull(), new_set['year'], new_set['homicides_per_100k'] )
del new_set['year']
new_set