使用groupby后,大熊猫中是否可以显示信息?

时间:2018-12-03 03:15:39

标签: python pandas

我想知道使用groupby后在熊猫中是否可以显示所有记录?

这是我的数据框

    class_a class_b   doc_num   year
    0   BG      24    DOC0134   2018    
    1   BG      31    DOC0134   2018    
    2   BG      13    DOC0134   2018    
    3   HS      24    DOC0134   2018    
    4   HS      31    DOC0134   2018    
    5   HS      13    DOC0134   2018    
    6   HL      13    DOC0256   2018    
    7   HL      25    DOC0256   2018    
    8   BG      13    DOC0256   2018    
    9   BG      25    DOC0256   2018    

我已使用groupby在2018年对文档进行分组

df_2018 = df.where(df.year == 2018).groupby(['year','class_b', 'class_a']).size().unstack(fill_value=0)
df_2018 = df_2018.replace(0, '', regex=True)
df_2018

并获得这样的表格结果

        class_a BG  HL  HS
year    class_b         
1971.0   13     2   1   1
         24     1       1
         25     1   1   
         31     1       1

然后我创建了数据框来显示像这样的表中的数据

list_all2018 = [(list(i), v) for i, v in df_2018.stack().iteritems()]
#change list to dataframe
list_all2018 = pd.DataFrame(list_all2018, columns=["All_class", "count"])
list_all2018
cols = ['year', 'class_b', 'class_a']
s = df.where(df.year == 2018).groupby(cols).size().unstack(fill_value=0).stack()
L = [{'year': idx[0], 'all_class': list(idx[1:]), 'count': vals} for idx, vals in s.items()]
list_all2018 = pd.DataFrame(L)
list_all2018

这是结果

     all_class count year
0   [13, BG]    2   2018
1   [13, HL]    1   2018
2   [13, HS]    1   2018
3   [24, BG]    1   2018
4   [24, HL]    0   2018
5   [24, HS]    1   2018
6   [25, BG]    1   2018
7   [25, HL]    1   2018
8   [25, HS]    0   2018
9   [31, BG]    1   2018
10  [31, HL]    0   2018
11  [31, HS]    1   2018

但是我也想从“计数”中打印出信息。这是我的预期结果

  all_class   count year  doc_mun
0   [13, BG]    2   2018  DOC0134 | DOC0256
1   [13, HL]    1   2018  DOC0256
2   [13, HS]    1   2018  DOC0134
3   [24, BG]    1   2018  DOC0134
4   [24, HL]    0   2018  
5   [24, HS]    1   2018  DOC0134
6   [25, BG]    1   2018  DOC0256
7   [25, HL]    1   2018  DOC0256
8   [25, HS]    0   2018
9   [31, BG]    1   2018  DOC0134
10  [31, HL]    0   2018
11  [31, HS]    1   2018

提前谢谢

1 个答案:

答案 0 :(得分:1)

我认为您可以使用 agg具有自定义功能,我正在使用stackunstack来完成子功能

doc_number=lambda x : '|'.join(x)
doc_number.__name__='doc_number'
df.groupby(['year','class_a','class_b']).doc_num.agg(['count',doc_number]).\
      unstack(1,fill_value=0).\
         stack().\
           reset_index()
Out[451]: 
    year  class_b class_a  count       doc_number
0   2018       13      BG      2  DOC0134|DOC0256
1   2018       13      HL      1          DOC0256
2   2018       13      HS      1          DOC0134
3   2018       24      BG      1          DOC0134
4   2018       24      HL      0                0
5   2018       24      HS      1          DOC0134
6   2018       25      BG      1          DOC0256
7   2018       25      HL      1          DOC0256
8   2018       25      HS      0                0
9   2018       31      BG      1          DOC0134
10  2018       31      HL      0                0
11  2018       31      HS      1          DOC0134