我有一个数据框
df_in = pd.DataFrame([[1,2,3,4,5,6,7,8,9]], columns=["ab","ef","cd","ij","klm","kln","ghw","ghx","klo"])
我有另一个数据框,其中定义了顺序
df_order = pd.DataFrame([["ab","gh"],["cd","ij"],["ef","kl"]], columns=["col1","col2"])
我想按照以下方式使用 df_order 重新排列数据框 df_in 的列。
第一个列名出现在 col1 中,然后所有以字符串开头的列出现在 col2 中。然后,列名出现在 col1 中,然后所有以字符串开头的列出现在 col2 中,然后再次下一行并重复。
预期输出:
df_out = pd.DataFrame([[1,7,8,3,4,2,5,6,9]], columns=["ab","ghw","ghx","cd","ij","ef","klm","kln","klo"])
怎么做?
答案 0 :(得分:4)
这是您可以尝试的解决方案,
from itertools import chain
# create a numeric index for each key to sort latter.
order_ = {
v: idx for idx, v in enumerate(chain.from_iterable(df_order.to_numpy()))
}
df_in.loc[:, sorted(df_in.columns, key=lambda x: order_[x[:2]])]
ab ghw ghx cd ij ef klm kln klo
0 1 7 8 3 4 2 5 6 9