如何用其他列的数据更新一列的python数据帧空行?

时间:2017-11-24 11:13:51

标签: python-3.x pandas dataframe

我有以下脚本,给出错误:

  

ValueError:无法从重复轴重新索引

代码:

dataAll.loc[dataAll['GenderCode'] == '', 'GenderCode'] = dataAll.loc[dataAll['SEX.id'] != '', 'SEX.id']

在SQL中,我将写为:

update dataAll set GenderCode=SEX.id where GenderCode='' and SEX.id!=''

如何实现这一目标?

我已执行以下脚本,但无效。

dataAll['GenderCode'].unique()
array(['001', '002', '003', '004', '096', '098', '', 'KN.GA'], dtype=object)
dataAll['SEX.id'].unique()
array(['001', '002', '003', '004', '096', '098', ''], dtype=object)

temp = dataAll.loc[dataAll['GenderCode'] == '']
len(temp)
>> 684090

mask = (dataAll['GenderCode'] == '') & (dataAll['SEX.id'] != '')
dataAll['GenderCode'] = np.where(mask, dataAll['SEX.id'], dataAll['GenderCode'])

temp = dataAll.loc[dataAll['GenderCode'] == '']
len(temp)
>> 684090

1 个答案:

答案 0 :(得分:1)

我相信您需要&的链条件,然后按掩码设置值:

mask = (dataAll['GenderCode'] == '') & (dataAll['SEX.id'] != '')
dataAll.loc[mask, 'GenderCode'] = dataAll['SEX.id']

或者:

dataAll['GenderCode'] = np.where(mask, dataAll['SEX.id'], dataAll['GenderCode'])

样品:

如果列中的值(index=1

)为空,则不会更新数据
dataAll = pd.DataFrame({'GenderCode':['a','','s',''],
                        'SEX.id':['','','b','d']})

print (dataAll)
  GenderCode SEX.id
0          a       
1                  
2          s      b
3                 d

mask = (dataAll['GenderCode'] == '') & (dataAll['SEX.id'] != '')
dataAll.loc[mask, 'GenderCode'] = dataAll['SEX.id']
print (dataAll)

  GenderCode SEX.id
0          a       
1                  
2          s      b
3          d      d

如果两列中的空字符串都添加了新条件并附加了no_data之类的新值,则附加数据的解决方案:

m = dataAll['GenderCode'] == ''
m1 = m & (dataAll['SEX.id'] != '')
m2 =  m  & (dataAll['SEX.id'] == '')

dataAll['GenderCode'] = np.select([m1, m2], 
                                  [dataAll['SEX.id'], 'no_data'], 
                                  default=dataAll['GenderCode'])
print (dataAll)

  GenderCode SEX.id
0          a       
1    no_data       
2          s      b
3          d      d