熊猫:重新索引仅对唯一值的索引对象有效

时间:2018-11-06 00:48:42

标签: python-2.7 pandas dataframe

python 2.7中具有以下两个数据集:

df1:

D_ID        D_NBR   D_ID    D_HR_LVL
851669006   8383    93433   IT
260969003   7337    83189   CORP
7383        8300    72521   FIN
260969003   6262    66611   No Data
919832001   22922   90111   IT
749277000   81123   53621   FIN
3353        6363    99931   No Data

df2:

U_ID        U_NBR
851669006   851669
749277000   749277
749838000   788363
919832001   919832
260969003   260969

要求:

if df1.D_HR_LVL == 'IT'
    then get df2.U_NBR using df2.U_ID 
elif df1.D_HR_LVL == 'FIN'
    then split df2.U_NBR in 3 and 2 digits
else
    keep the things as it is

尝试:

a1 = df1.D_ID.astype(str).where(df1.D_HR_LVL.eq("IT"))
a2 = df1.D_ID.map(df2.set_index('U_ID').U_NBR.astype(str))
ncol = (df1.D_ID.astype(str).str.extract(r'(\d{3})(\d+)').where(df1.D_HR_LVL.eq("FIN")).rename(columns=lambda x: 'N_COL{}'.format(x+1)))

mer_df = pd.concat([df1,a1,a2,ncol],axis=1)

但出现错误:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

目标是获得以下O / P:

new_df:

D_ID        D_NBR   D_ID    D_HR_LVL  U_NBR     N_Col_1     N_Col_2
851669006   8383    93433   IT        851669
260969003   7337    83189   CORP      
7383        8300    72521   FIN
260969003   6262    66611   No Data
919832001   22922   90111   IT        919832
749277000   81123   53621   FIN       749277    749         27
3353        6363    99931   No Data

任何建设性的帮助/建议都是非常重要的。

1 个答案:

答案 0 :(得分:2)

该错误可能是由于您在df1中有2个名为“ D_ID”的列而引起的。例如,如果使用df1.columns = [u'D_ID', u'D_NBR', u'D_ID2', u'D_HR_LVL']重命名它们,则图形将运行,但不能提供所需的确切输出。

这是另一种方法,您可以先将所有comfum ID上的df1与df2合并,然后根据您的要求更改值,例如:

df1.columns = [u'D_ID', u'D_NBR', u'D_ID2', u'D_HR_LVL']
mer_df = df1.merge(df2.rename(columns={'U_ID':'D_ID'}),how='left').fillna('')

#requirement else: looking for values in D_HR_LVL not IT or FIN and erase them 
mer_df.loc[~mer_df.D_HR_LVL.isin(['IT','FIN']),'U_NBR'] = ''

#create the column N_Col_1 and N_Col_2
mask_FIN = (mer_df.D_HR_LVL=='FIN') & (mer_df.U_NBR != '') #mask to select the rows
mer_df.loc[mask_FIN,'N_Col_1'] = mer_df.loc[mask_FIN,'D_ID'].astype(str).str[:3]
mer_df.loc[mask_FIN,'N_Col_2'] = mer_df.loc[mask_FIN,'D_ID'].astype(str).str[3:5]

# fillna
mer_df = mer_df.fillna('')

print (mer_df)
        D_ID  D_NBR  D_ID2 D_HR_LVL   U_NBR N_Col_1 N_Col_2
0  851669006   8383  93433       IT  851669                
1  260969003   7337  83189     CORP                        
2       7383   8300  72521      FIN                        
3  260969003   6262  66611  No Data                        
4  919832001  22922  90111       IT  919832                
5  749277000  81123  53621      FIN  749277     749      27
6       3353   6363  99931  No Data