在groupby语句中聚合列

时间:2015-10-29 13:28:49

标签: python pandas

在数据框中使用group by时,我可以将特定列的结果收集为列表吗?

我不确定这个细节在这里有意义但是,在PostgreSQL中有一个函数array_agg(columnname)来实现同样的目的。

此外,我尝试在API文档中查找详细信息,但未尝试成功。

train
Out[6]: 
    TripType  VisitNumber Weekday  ScanCount  DepartmentDescription
1         30            7  Friday          1                  SHOES
2         30            7  Friday          1          PERSONAL CARE
3         26            8  Friday          2  PAINT AND ACCESSORIES
4         26            8  Friday          2  PAINT AND ACCESSORIES
5         26            8  Friday          2  PAINT AND ACCESSORIES
6         26            8  Friday          1  PAINT AND ACCESSORIES
7         26            8  Friday          1  PAINT AND ACCESSORIES
8         26            8  Friday          1  PAINT AND ACCESSORIES
9         26            8  Friday         -1  PAINT AND ACCESSORIES
10        26            8  Friday          1            DSD GROCERY
11        26            8  Friday          2  PAINT AND ACCESSORIES
12        26            8  Friday          1  MEAT - FRESH & FROZEN
13        26            8  Friday          1  PAINT AND ACCESSORIES
14        26            8  Friday         -1  PAINT AND ACCESSORIES
15        26            8  Friday          2  PAINT AND ACCESSORIES
16        26            8  Friday          1  PAINT AND ACCESSORIES
17        26            8  Friday          1  PAINT AND ACCESSORIES
18        26            8  Friday          2                  DAIRY
19        26            8  Friday          1      PETS AND SUPPLIES

train.groupby(['VisitNumber','Weekday','TripType']).count()
Out[7]: 
                              ScanCount  DepartmentDescription
VisitNumber Weekday TripType                                  
7           Friday  30                2                      2
8           Friday  26               17                     17

我的意思是第一个分组行的结果如下所示

                              ScanCount  DepartmentDescription
VisitNumber Weekday TripType                                  
7           Friday  30                2                     [SHOES,PERSONAL CARE]

数据集:

{'DepartmentDescription': {1: 'SHOES',
  2: 'PERSONAL CARE',
  3: 'PAINT AND ACCESSORIES',
  4: 'PAINT AND ACCESSORIES',
  5: 'PAINT AND ACCESSORIES'},
 'ScanCount': {1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
 'TripType': {1: 30, 2: 30, 3: 26, 4: 26, 5: 26},
 'VisitNumber': {1: 7, 2: 7, 3: 8, 4: 8, 5: 8},
 'Weekday': {1: 'Friday', 2: 'Friday', 3: 'Friday', 4: 'Friday', 5: 'Friday'}}

1 个答案:

答案 0 :(得分:1)

IIUC你想要以下内容:

In [248]:
df.groupby(['VisitNumber','Weekday','TripType'])['DepartmentDescription'].apply(list)

Out[248]:
VisitNumber  Weekday  TripType
7            Friday   30                                     [SHOES, PERSONAL CARE]
8            Friday   26          [PAINT AND ACCESSORIES, PAINT AND ACCESSORIES,...
Name: DepartmentDescription, dtype: object