连接多行熊猫中多列的字符串?

时间:2018-11-08 20:28:18

标签: pandas pandas-groupby pandas-apply

我有两个日期框架,如下所示:

import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
               'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
               'testName':   [4402, 3747 ,5555,8754,1234,9876,3602],
               'moduleName':   ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
               'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
               'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
               'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})

df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
               'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
               'Work':   ['AP', 'TC','OD', 'PU','NO','PM'],
               'Date':['2018-10-4','2018-10-6','2018-10-5','2018-10-7','2018-10-5','2018-10-10'],
               'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
               })

现在,我将两个框架合并如下:

df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')

我想分组

result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
result.drop(['Date_x','Date_y','Time_df1','Time_df2'],axis=1,inplace=True)
result = result.groupby(['serialNo'])['Food'].apply(','.join).reset_index()

但是我希望输出看起来像这样:

output = pd.DataFrame({'serialNo':['aaaa','bbbb','ffff'],
               'Name':['Sayonti,Sayonti,Sayonti','Ruchi','Gowtam'],
               'testName':   ['4402,4402,3602','3747','8754'],
               'moduleName':   ['singing,singing,singing', 'dance','vocals'],
               'endResult': ['WARNING,WARNING,WARNING','FAILED','FAILED'],
               'Food':['Strawberry,Pepsi,Pepsi','Coke','Nuts'],
               'Work':['AP,OD,OD','TC','PU']})

我该如何实现?我基本上需要弄清楚如何将多个列一起应用(','。join)?

1 个答案:

答案 0 :(得分:0)

您可以使用:

result.groupby('serialNo').agg(list) #To get a list of values

输出:

                                 Name            testName  \
serialNo                                                    
aaaa      [Sayonti, Sayonti, Sayonti]  [4402, 4402, 3602]   
bbbb                          [Ruchi]              [3747]   
ffff                         [Gowtam]              [8754]   

                           moduleName                    endResult  \
serialNo                                                             
aaaa      [singing, singing, singing]  [WARNING, WARNING, WARNING]   
bbbb                          [dance]                     [FAILED]   
ffff                         [vocals]                     [FAILED]   

                                Food          Work  
serialNo                                            
aaaa      [Strawberry, Pepsi, Pepsi]  [AP, OD, OD]  
bbbb                          [Coke]          [TC]  
ffff                          [Nuts]          [PU]  

result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings

输出:

                               Name          testName  \
serialNo                                                
aaaa      Sayonti, Sayonti, Sayonti  4402, 4402, 3602   
bbbb                          Ruchi              3747   
ffff                         Gowtam              8754   

                         moduleName                  endResult  \
serialNo                                                         
aaaa      singing, singing, singing  WARNING, WARNING, WARNING   
bbbb                          dance                     FAILED   
ffff                         vocals                     FAILED   

                              Food        Work  
serialNo                                        
aaaa      Strawberry, Pepsi, Pepsi  AP, OD, OD  
bbbb                          Coke          TC  
ffff                          Nuts          PU