Question

我需要合并数据框中的列。

标题将具有相似的名称，并带有不同的后缀，例如

A1 | A2 | A3 | B1 | B2 | B3

我想最终合并所有这些：

A | B

我有这一行成功地将一组已定义的列合并为一列：

df['A'] = df[['A1','A2','A3]].apply(' '.join, axis=1)

问题在于标题不一致，因为可能存在＆＃39; 1＆＃39; 2＆＃39; 2＆＃39;或＆＃39; 3＆＃39; - 例如

A1 | A2 | A3 | B2 | C1 | C2

根据我所看到的解决方案，大熊猫不想引用不存在的列，因此我无法使用apply语句作为一揽子命令。

我无法在嵌套的Try / Except步骤列表之外可视化解决方案。如果有人有想法，我会很感激！

更新
谢谢你的解决方案！如果有人有兴趣，这里有什么对我有用：

解决方案1 

for h in headers:
    cols = [col for col in df.columns if col.split('[')[0] == h]
    if cols == []:
        cols = [col for col in df.columns if col == h and col.split('[')[0] not in headers] `

解决方案2

df.groupby(df.columns.str.split('[').str[0],axis=1).agg(lambda x :' '.join(x.values.tolist()))

Answer 1

您可以使用df.columns属性查找相关列

a_cols = [col for col in df.columns if col[0] == 'A']

然后使用该列表作为应用函数的输入

df['A'] = df[a_cols].apply(' '.join, axis=1)

Answer 2

例如，您有以下数据框

df=pd.DataFrame({'A1':['a'],'A2':['b'],'B2':['b'],'B3':['c']})

我们在列

上使用groupby

df.groupby(df.columns.str[0],axis=1).agg(lambda x :','.join(x.values.tolist()))
Out[282]: 
     A    B
0  a,b  b,c

Answer 3

import string
df = pd.DataFrame(columns=['A1', 'A2','A3', 'B1','B2','C1'])

new_cols = {}
for new_col in list(string.ascii_uppercase):
    new_cols[new_col] = [col for col in df.columns if new_col in col]

for new_col in new_cols.keys():
    df[new_col] = df[new_cols[new_col]].apply(' '.join, axis=1)

根据列标题

3 个答案: