在pandas中加入或聚合属于同一组的字符串

时间:2018-02-19 21:51:04

标签: python string pandas dataframe

鉴于这样的CSV,如何组合来自同一列的信息?

First,Last,Email,Group
Tim,Elfelt,tim@domain.com,Information Systems
Tim,Elfelt,tim@domain.com,Technology Training

应根据电子邮件列

的组合输出
First,Last,Email,Group
Tim,Elfelt,tim@domain.com,Information Systems;Technology Training

编辑:感谢coldspeed,工作解决方案:

import pandas as pd
data = pd.read_csv('combinedemails.csv', encoding='utf-8',
                   usecols=['First', 'Last', 'Email', 'Group', 'List']).groupby(['First', 'Last', 'Email']).Group.apply(
    '; '.join).reset_index(name='Group')

data.to_csv('output.csv', sep=',', encoding='utf-8')

1 个答案:

答案 0 :(得分:5)

您可以使用groupby + str.join

df.groupby(['First', 'Last', 'Email']).Group.apply('; '.join).reset_index(name='Group')

  First    Last           Email                                     Group
0   Tim  Elfelt  tim@domain.com  Information Systems; Technology Training