我的数据框包含这些列
ID Address1 Address1-State Address1-City Address2 Address2-State Address2-City Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 MA Boston Essex St NY New York
4 7 street SE MN Mpls 8th St IL Chicago
现在,我要以这样一种方式填充地址字段:如果地址1为空白,则填充地址2和地址2的州城市字段
在上述情况下,最终数据帧将是这样
ID Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 Essex St NY New York
4 7 street SE MN Mpls
当前,我正在这样做
def fill_add(address1,address2):
if address1!='':
address=address1
elif address1=='' and address2!='':
address=address2
elif address1=='' and address2=='':
address=''
return address
def fill_add_apply(df):
df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)
我是否需要对所有其他列执行相同的操作?是否有更好的方法?
请澄清一下,在ID = 3中,地址,州,城市应为“ Essex St NY New York”,因为地址1为空白,因此应选择地址2和地址2的城市和州。 简而言之,如果Address1为空白,则即使Address1-State和Address1-City不为空白,也应选择Address2,Address2-State和Address2-City。
答案 0 :(得分:1)
首先修改您的列,然后使用groupby
+ first
df=df.replace('',np.nan)#prepare for first
df.columns=df.columns.str.replace('\d+','')
df.columns=df.columns.str.split('-').str[-1]
newdf=df.groupby(level=0,axis=1).first()
newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
newdf
Out[40]:
Address City ID State
0 6th street Mpls 1 MN
1 15th St Flint 2 MI
2 Essexb St New York 3 NY
3 7 street SE Mpls 4 MN
答案 1 :(得分:1)
import numpy as np
df=df.replace('',np.nan)
addr_1=['ID','Address1','Address1-State','Address1-City']
addr_2=['ID','Address2','Address2-State','Address2-City']
new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])
new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values
#print(new_df)
ID Address State City
0 1 6th street MN Mpls
1 2 15th St MI Flint
2 3 Essex St NY New York
3 4 7 street SE MN Mpls
答案 2 :(得分:0)
(鉴于您没有重复的索引)
选择要用Adress1填充的索引:
Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index
然后将Address1数据放入所需的列:
df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]
现在选择要用地址2填充的索引:
Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index
然后还填写以下内容:
df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]
删除不需要的列:
df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)