在熊猫中合并同一数据帧的两列以上

时间:2021-06-22 20:39:38

标签: python pandas dataframe merge

尝试重新组织以下数据框,以便 [VAR] 1-3 沿 [GROUP] 列按数字顺序合并

VAR 1   VAR 2   VAR 3   GROUP 
                3   [0-10]
1               3   [0-10]
1               3   [0-10]
1       2           [0-10]
        2           [0-10]
3              3    [10-20]
3       1           [10-20]
        1           [10-20]
        2           [10-20]
               2    [10-20]
               2    [10-20]

试图将其作为最终结果:

VAR_MERGED  GROUP 
1           [0-10]
1           [0-10]
1           [0-10]
2           [0-10]
2           [0-10]
3           [0-10]
3           [0-10]
3           [0-10]
1           [10-20]
1           [10-20]
2           [10-20]
2           [10-20]
2           [10-20]
3           [10-20]
3           [10-20]
3           [10-20]

我尝试使用 df['VAR_MERGED'] = df[['VAR 1' 'VAR 2' 'VAR 3']].agg('-'.join, axis=1) 但得到关于预期 str 的错误,但 VAR 列中的值都是浮点数但不确定为什么这需要字符串值?

2 个答案:

答案 0 :(得分:1)

输入数据:

>>> df
    VAR 1  VAR 2  VAR 3    GROUP  ANOTHER
0     NaN    NaN    3.0   [0-10]  another
1     1.0    NaN    3.0   [0-10]  another
2     1.0    NaN    3.0   [0-10]  another
3     1.0    2.0    NaN   [0-10]  another
4     NaN    2.0    NaN   [0-10]  another
5     3.0    NaN    3.0  [10-20]  another
6     3.0    1.0    NaN  [10-20]  another
7     NaN    1.0    NaN  [10-20]  another
8     NaN    2.0    NaN  [10-20]  another
9     NaN    NaN    2.0  [10-20]  another
10    NaN    NaN    2.0  [10-20]  another

您可以使用 melt。要完全理解,可以逐行执行代码(df.melt(...)df.melt(...).dropna()df.melt(...).dropna.sort_values(...) 等):

id_vars = df.columns[~df.columns.str.startswith('VAR')]

out = df.melt(id_vars, value_name='VAR_MERGED') \
        .dropna() \
        .sort_values(['GROUP', 'VAR_MERGED']) \
        .reset_index(drop=True) \
        [['VAR_MERGED'] + id_vars]

结果输出:

>>> out
    VAR_MERGED    GROUP  ANOTHER
0          1.0   [0-10]  another
1          1.0   [0-10]  another
2          1.0   [0-10]  another
3          2.0   [0-10]  another
4          2.0   [0-10]  another
5          3.0   [0-10]  another
6          3.0   [0-10]  another
7          3.0   [0-10]  another
8          1.0  [10-20]  another
9          1.0  [10-20]  another
10         2.0  [10-20]  another
11         2.0  [10-20]  another
12         2.0  [10-20]  another
13         3.0  [10-20]  another
14         3.0  [10-20]  another
15         3.0  [10-20]  another

答案 1 :(得分:1)

通过 set_index / stack 的一种方式:

df = (
    df.set_index('GROUP')
    .stack()
    .sort_index(level=[0,1])
    .reset_index(-1,drop=True)
    .reset_index(name = 'Var Merged').iloc[:, ::-1]
)
相关问题