在匹配列名称后连接列

时间:2019-07-29 07:10:30

标签: python pandas

我想连接数据集的两个相似命名列的集合。列如下所示:

URO_Brand1_Target,URO_Brand1,URO_Brand2_Target,URO_Brand2,URO_Brand3_Target

这些可能没有特定的顺序出现在数据集中。现在基于条件,如果列名“ URO_Brand1_Target”中包含列名“ URO_Brand1”,我必须将两列连接起来。而我必须对所有类似的列进行设置。

类似这样的东西:

URO_Brand1_Target URO_Brand1   Concatenate(URO_Brand1, URO_Brand1_Target)
     95%      CIG0002069   CIG0002069,95%

1 个答案:

答案 0 :(得分:0)

使用pandas.DataFrame.groupby。假设您有一个df

  URO_Brand1_Target  URO_Brand1 URO_Brand2_Target  URO_Brand2
0               95%  something1               90%  something2

使用groupby遍历组:

for k, d in df.groupby(df.columns.str[:10], axis=1):
    tmp = d.sort_index(1)
    df['Concatenate(%s)' % ', '.join(d.columns)] = tmp.apply(','.join, 1)

输出:

  URO_Brand1_Target  URO_Brand1 URO_Brand2_Target  URO_Brand2  \
0               95%  something1               90%  something2   

  Concatenate(URO_Brand1_Target, URO_Brand1)  \
0                             something1,95%   

  Concatenate(URO_Brand2_Target, URO_Brand2)  
0                             something2,90%