我想连接数据集的两个相似命名列的集合。列如下所示:
URO_Brand1_Target,URO_Brand1,URO_Brand2_Target,URO_Brand2,URO_Brand3_Target
这些可能没有特定的顺序出现在数据集中。现在基于条件,如果列名“ URO_Brand1_Target”中包含列名“ URO_Brand1”,我必须将两列连接起来。而我必须对所有类似的列进行设置。
类似这样的东西:
URO_Brand1_Target URO_Brand1 Concatenate(URO_Brand1, URO_Brand1_Target)
95% CIG0002069 CIG0002069,95%
答案 0 :(得分:0)
使用pandas.DataFrame.groupby
。假设您有一个df
:
URO_Brand1_Target URO_Brand1 URO_Brand2_Target URO_Brand2
0 95% something1 90% something2
使用groupby
遍历组:
for k, d in df.groupby(df.columns.str[:10], axis=1):
tmp = d.sort_index(1)
df['Concatenate(%s)' % ', '.join(d.columns)] = tmp.apply(','.join, 1)
输出:
URO_Brand1_Target URO_Brand1 URO_Brand2_Target URO_Brand2 \
0 95% something1 90% something2
Concatenate(URO_Brand1_Target, URO_Brand1) \
0 something1,95%
Concatenate(URO_Brand2_Target, URO_Brand2)
0 something2,90%