在python 2.7
中具有以下两个数据集:
df1:
D_ID D_NBR D_ID D_HR_LVL
851669006 8383 93433 IT
260969003 7337 83189 CORP
7383 8300 72521 FIN
260969003 6262 66611 No Data
919832001 22922 90111 IT
749277000 81123 53621 FIN
3353 6363 99931 No Data
df2:
U_ID U_NBR
851669006 851669
749277000 749277
749838000 788363
919832001 919832
260969003 260969
要求:
if df1.D_HR_LVL == 'IT'
then get df2.U_NBR using df2.U_ID
elif df1.D_HR_LVL == 'FIN'
then split df2.U_NBR in 3 and 2 digits
else
keep the things as it is
尝试:
a1 = df1.D_ID.astype(str).where(df1.D_HR_LVL.eq("IT"))
a2 = df1.D_ID.map(df2.set_index('U_ID').U_NBR.astype(str))
ncol = (df1.D_ID.astype(str).str.extract(r'(\d{3})(\d+)').where(df1.D_HR_LVL.eq("FIN")).rename(columns=lambda x: 'N_COL{}'.format(x+1)))
mer_df = pd.concat([df1,a1,a2,ncol],axis=1)
但出现错误:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
目标是获得以下O / P:
new_df:
D_ID D_NBR D_ID D_HR_LVL U_NBR N_Col_1 N_Col_2
851669006 8383 93433 IT 851669
260969003 7337 83189 CORP
7383 8300 72521 FIN
260969003 6262 66611 No Data
919832001 22922 90111 IT 919832
749277000 81123 53621 FIN 749277 749 27
3353 6363 99931 No Data
任何建设性的帮助/建议都是非常重要的。
答案 0 :(得分:2)
该错误可能是由于您在df1
中有2个名为“ D_ID”的列而引起的。例如,如果使用df1.columns = [u'D_ID', u'D_NBR', u'D_ID2', u'D_HR_LVL']
重命名它们,则图形将运行,但不能提供所需的确切输出。
这是另一种方法,您可以先将所有comfum ID上的df1与df2合并,然后根据您的要求更改值,例如:
df1.columns = [u'D_ID', u'D_NBR', u'D_ID2', u'D_HR_LVL']
mer_df = df1.merge(df2.rename(columns={'U_ID':'D_ID'}),how='left').fillna('')
#requirement else: looking for values in D_HR_LVL not IT or FIN and erase them
mer_df.loc[~mer_df.D_HR_LVL.isin(['IT','FIN']),'U_NBR'] = ''
#create the column N_Col_1 and N_Col_2
mask_FIN = (mer_df.D_HR_LVL=='FIN') & (mer_df.U_NBR != '') #mask to select the rows
mer_df.loc[mask_FIN,'N_Col_1'] = mer_df.loc[mask_FIN,'D_ID'].astype(str).str[:3]
mer_df.loc[mask_FIN,'N_Col_2'] = mer_df.loc[mask_FIN,'D_ID'].astype(str).str[3:5]
# fillna
mer_df = mer_df.fillna('')
print (mer_df)
D_ID D_NBR D_ID2 D_HR_LVL U_NBR N_Col_1 N_Col_2
0 851669006 8383 93433 IT 851669
1 260969003 7337 83189 CORP
2 7383 8300 72521 FIN
3 260969003 6262 66611 No Data
4 919832001 22922 90111 IT 919832
5 749277000 81123 53621 FIN 749277 749 27
6 3353 6363 99931 No Data