熊猫groupby和串联字符串

时间:2020-06-04 02:47:12

标签: python pandas

我有一个像这样的数据框

pd.DataFrame(
    [
        ['1', 'x', 'a'],
        ['1', 'y', 'b'],
        ['1', 'z', 'c'],
        ['2', 'x', 'a'],
        ['2', 'y', 'b'],
        ['2', 'z', 'c']
    ], columns = ['one', 'two', 'three']
)

    one two three
0   1   x   a
1   1   y   b
2   1   z   c
3   2   x   a
4   2   y   b
5   2   z   c

我想得到一个如下所示的数据框,

    one     two plus three
0   1       x + a\ny + b\nz + c
1   2       x + a\ny + b\nz + c

我该怎么做?我尝试使用df.sum(axis=1),但无法弄清楚如何将df分组以包含每3条记录,水平求和并在之间添加\ n

2 个答案:

答案 0 :(得分:2)

尝试使用groupyagg + join

s=df[['two','three']].agg('+'.join,1).groupby(df.one).agg('/n'.join).\
              to_frame('two + three').reset_index()
   one    two + three
0    1  x+a/ny+b/nz+c
1    2  x+a/ny+b/nz+c

答案 1 :(得分:0)

import pandas as pd
df = pd.DataFrame(
    [
        ['1', 'x', 'a'],
        ['1', 'y', 'b'],
        ['1', 'z', 'c'],
        ['2', 'x', 'a'],
        ['2', 'y', 'b'],
        ['2', 'z', 'c']
    ], columns = ['one', 'two', 'three']
)


df['two_plus_three'] = df['two'] + ' + ' +df['three'] + '\n'
df.groupby('one')[['two_plus_three']].sum().reset_index()

  one         two_plus_three
0   1  x + a\ny + b\nz + c\n
1   2  x + a\ny + b\nz + c\n