Pandas - 根据列中的值创建列名

时间:2017-09-14 15:21:07

标签: python pandas

我想根据列

中的值创建列名

这就是我所拥有的:

part_number source  recent_date  recent_price
    0023496     a1  2017-06-27    55.0
    0023496     e1  2017-08-03    315.0
    0023084     a1  2017-01-12    255.0
    0023084     e1    NaN           NaN

这是我想要的输出:

part_number a1_recent_date   a1_recent_price   e1_recent_date e1_recent_price

0023496     2017-06-27       55.0               2017-08-03        315.0
0023084     2017-01-12      255.0                  NaN             NaN

2 个答案:

答案 0 :(得分:2)

使用set_indexunstack

In [520]: dff = df.set_index(['part_number', 'source']).unstack()

In [521]: dff
Out[521]:
            recent_date             recent_price
source               a1          e1           a1     e1
part_number
23084        2017-01-12         NaN        255.0    NaN
23496        2017-06-27  2017-08-03         55.0  315.0

然后,设置列名

In [522]: dff.columns = dff.columns.map(lambda x: '{1}_{0}'.format(*x))

In [523]: dff
Out[523]:
            a1_recent_date e1_recent_date  a1_recent_price  e1_recent_price
part_number
23084           2017-01-12            NaN            255.0              NaN
23496           2017-06-27     2017-08-03             55.0            315.0

详细

In [527]: df
Out[527]:
   part_number source recent_date  recent_price
0        23496     a1  2017-06-27          55.0
1        23496     e1  2017-08-03         315.0
2        23084     a1  2017-01-12         255.0
3        23084     e1         NaN           NaN

答案 1 :(得分:1)

这可以做到:

pd.concat([agg_df.add_prefix(index+'_').reset_index() 
           for index,agg_df  in df.groupby('source', as_index=False)],
           axis=1)  

说明:

  1. 根据soure:df.groupby('source', as_index=False)
  2. 的值创建数据框组
  3. 遍历这些组for index,agg_df ...
  4. 为每个组添加源值作为前缀和reset_index:agg_df.add_prefix(index+'_').reset_index()
  5. 最后,将所有组连接回一个数据帧:pd.concat([...])
  6. 结果:

    In [46]: pd.concat([agg_df.add_prefix(index+'_').reset_index() 
        ...:            for index,agg_df  in df.groupby('source', as_index=False)],
        ...:            axis=1)  
    Out[46]: 
       index a1_part_number a1_source a1_recent_date a1_recent_price  index  \
    0      0        0023496        a1     2017-06-27            55.0      1   
    1      2        0023084        a1     2017-01-12           255.0      3   
    
      e1_part_number e1_source e1_recent_date e1_recent_price  
    0        0023496        e1     2017-08-03           315.0  
    1        0023084        e1            NaN             NaN  
    
    In [47]: