我有以下脚本,给出错误:
ValueError:无法从重复轴重新索引
代码:
dataAll.loc[dataAll['GenderCode'] == '', 'GenderCode'] = dataAll.loc[dataAll['SEX.id'] != '', 'SEX.id']
在SQL中,我将写为:
update dataAll set GenderCode=SEX.id where GenderCode='' and SEX.id!=''
如何实现这一目标?
我已执行以下脚本,但无效。
dataAll['GenderCode'].unique()
array(['001', '002', '003', '004', '096', '098', '', 'KN.GA'], dtype=object)
dataAll['SEX.id'].unique()
array(['001', '002', '003', '004', '096', '098', ''], dtype=object)
temp = dataAll.loc[dataAll['GenderCode'] == '']
len(temp)
>> 684090
mask = (dataAll['GenderCode'] == '') & (dataAll['SEX.id'] != '')
dataAll['GenderCode'] = np.where(mask, dataAll['SEX.id'], dataAll['GenderCode'])
temp = dataAll.loc[dataAll['GenderCode'] == '']
len(temp)
>> 684090
答案 0 :(得分:1)
我相信您需要&
的链条件,然后按掩码设置值:
mask = (dataAll['GenderCode'] == '') & (dataAll['SEX.id'] != '')
dataAll.loc[mask, 'GenderCode'] = dataAll['SEX.id']
或者:
dataAll['GenderCode'] = np.where(mask, dataAll['SEX.id'], dataAll['GenderCode'])
样品:
如果列中的值(index=1
行
dataAll = pd.DataFrame({'GenderCode':['a','','s',''],
'SEX.id':['','','b','d']})
print (dataAll)
GenderCode SEX.id
0 a
1
2 s b
3 d
mask = (dataAll['GenderCode'] == '') & (dataAll['SEX.id'] != '')
dataAll.loc[mask, 'GenderCode'] = dataAll['SEX.id']
print (dataAll)
GenderCode SEX.id
0 a
1
2 s b
3 d d
如果两列中的空字符串都添加了新条件并附加了no_data
之类的新值,则附加数据的解决方案:
m = dataAll['GenderCode'] == ''
m1 = m & (dataAll['SEX.id'] != '')
m2 = m & (dataAll['SEX.id'] == '')
dataAll['GenderCode'] = np.select([m1, m2],
[dataAll['SEX.id'], 'no_data'],
default=dataAll['GenderCode'])
print (dataAll)
GenderCode SEX.id
0 a
1 no_data
2 s b
3 d d