我有这样的输入
zip state
95648 CA
95683 CA
95648 NaN
95648 CA
95649 CA
我想通过减少zip填充状态值。 输出应该是:
zip state
95648 CA
95683 CA
95648 **CA**
95648 CA
95649 CA
现在,我试过这样的话:
1. creating a map
2. take a copy of zip column as zip1.
3. replacing values of zip with state
4. swap all and delete zip1
但寻找更好的方法。 将值加载到数据中(作为数据帧)
map1 = data[['zip','state']]
map1 = data.set_index('zip')['state'].to_dict()
print(map1)产生:{95838:'CA',95823:'CA',95815:'CA',95834:'CA',95828:'CA'}
data['zip1'] = data['zip']
data = data.replace({"zip": map1})
print (data.head(10))
data['state'] = data['zip']
data['zip'] = data['zip1']
data = data.drop(['zip1'],axis=1)
print (data.head(10))
答案 0 :(得分:0)
创建地图后,您可以使用pd.Series.map()
,这会将字典作为参数。
map1 = data.set_index('zip')['state'].dropna().to_dict()
data['state'] = data['zip'].map(map1)
或者,如果您从df获得有关邮政编码状态配对的所有信息,您也可以使用单行
data['state'] = data.sort_values('state').groupby('zip')['state'].fillna(method='ffill')