相当于多列的vlookup的Pandas

时间:2019-11-25 17:15:33

标签: python pandas

我想在多个用户列下返回每个用户的total_points列。

更清楚地解释

{'secondBoxer1': {0: 'Cody',
  1: 'Billy',
  2: 'Jennifer',
  3: 'Franc',
  4: 'Mark'},
 'secondBoxer2': {0: 'Tamis',
  1: 'Danye',
  2: 'Leesa',
  3: 'Hector',
  4: 'Coy'},
 'secondBoxer3': {0: 'Davin',
  1: 'Delbert',
  2: 'Kanisca',
  3: 'Luis',
  4: 'nan'},
 'secondBoxer4': {0: 'Caro',
  1: 'John',
  2: 'nan',
  3: 'Jose',
  4: 'nan'},
 'secondBoxer5': {0: 'Caro',
  1: 'Ryan',
  2: 'nan',
  3: 'Jose',
  4: 'nan'},
 'secondBoxer6': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'Luis', 4: 'nan'}}

我有5个secondBoxer列,对于每个Boxer列,我希望将其合并到来自与secondBoxer列下的名称相对应的不同数据框中的总点数列中

    name            total_points
0   Hector            50.000
1   John              48.000
2   Jose              30.000
3   Luis              31.875
4   Billy             27.500 

在这种情况下,所需的输出将是

secondBoxer1  total_points1  secondBoxer2  total_points2  ....
  Cody                          Tamis
  Billy          27.500         Danye
  Jeniffer                      Leesa
  Franc                         Hector        50.000
  Mark                          Coy

我尝试合并一个for循环以遍历所有列(实际数据集有50多个secondBoxer cols)并与第二个数据集合并以获得total_points,但未成功。

listen = ['secondBoxer1','secondBoxer2','secondBoxer3','secondBoxer4','secondBoxer5','secondBoxer6']
for i in listen:
    df=df.merge(df2[['name','total_points']],left_on=i,right_on='name')

但是这将返回一个空的数据集

2 个答案:

答案 0 :(得分:2)

IIUC依次为mapconcat

out1=out.apply(lambda x : x.map(dict(zip(df.name,df.total_points))))
out1.columns='total_points'+out1.columns.str.strip('secondBoxer')
out=pd.concat([out,out1],axis=1)

在这里我们需要argsort重新排列数字

out=out.iloc[:,out.columns.str.extract('(\d+)')[0].argsort()]

out
Out[151]: 
  secondBoxer1  total_points1  ... secondBoxer6  total_points6
0         Cody            NaN  ...          nan            NaN
1        Billy           27.5  ...          nan            NaN
2     Jennifer            NaN  ...          nan            NaN
3        Franc            NaN  ...         Luis         31.875
4         Mark            NaN  ...          nan            NaN
[5 rows x 12 columns]

答案 1 :(得分:1)

这是另一种方式:

s=df2.set_index('name')['total_points']
final=df1.assign(**pd.DataFrame(np.where(df1.isin(s.index),df1.replace(s),np.nan)
                                ,columns=df1.columns.str[-1]).add_prefix('total_points'))
print(final[sorted(final.columns,key=lambda x: x[-1])])

  secondBoxer1 total_points1 secondBoxer2 total_points2 secondBoxer3  \
0         Cody           NaN        Tamis           NaN        Davin   
1        Billy          27.5        Danye           NaN      Delbert   
2     Jennifer           NaN        Leesa           NaN      Kanisca   
3        Franc           NaN       Hector            50         Luis   
4         Mark           NaN          Coy           NaN          nan   

  total_points3 secondBoxer4 total_points4 secondBoxer5 total_points5  \
0           NaN         Caro           NaN         Caro           NaN   
1           NaN         John            48         Ryan           NaN   
2           NaN          nan           NaN          nan           NaN   
3        31.875         Jose            30         Jose            30   
4           NaN          nan           NaN          nan           NaN   

  secondBoxer6 total_points6  
0          nan           NaN  
1          nan           NaN  
2          nan           NaN  
3         Luis        31.875  
4          nan           NaN