我想知道使用groupby后在熊猫中是否可以显示所有记录?
这是我的数据框
class_a class_b doc_num year
0 BG 24 DOC0134 2018
1 BG 31 DOC0134 2018
2 BG 13 DOC0134 2018
3 HS 24 DOC0134 2018
4 HS 31 DOC0134 2018
5 HS 13 DOC0134 2018
6 HL 13 DOC0256 2018
7 HL 25 DOC0256 2018
8 BG 13 DOC0256 2018
9 BG 25 DOC0256 2018
我已使用groupby在2018年对文档进行分组
df_2018 = df.where(df.year == 2018).groupby(['year','class_b', 'class_a']).size().unstack(fill_value=0)
df_2018 = df_2018.replace(0, '', regex=True)
df_2018
并获得这样的表格结果
class_a BG HL HS
year class_b
1971.0 13 2 1 1
24 1 1
25 1 1
31 1 1
然后我创建了数据框来显示像这样的表中的数据
list_all2018 = [(list(i), v) for i, v in df_2018.stack().iteritems()]
#change list to dataframe
list_all2018 = pd.DataFrame(list_all2018, columns=["All_class", "count"])
list_all2018
cols = ['year', 'class_b', 'class_a']
s = df.where(df.year == 2018).groupby(cols).size().unstack(fill_value=0).stack()
L = [{'year': idx[0], 'all_class': list(idx[1:]), 'count': vals} for idx, vals in s.items()]
list_all2018 = pd.DataFrame(L)
list_all2018
这是结果
all_class count year
0 [13, BG] 2 2018
1 [13, HL] 1 2018
2 [13, HS] 1 2018
3 [24, BG] 1 2018
4 [24, HL] 0 2018
5 [24, HS] 1 2018
6 [25, BG] 1 2018
7 [25, HL] 1 2018
8 [25, HS] 0 2018
9 [31, BG] 1 2018
10 [31, HL] 0 2018
11 [31, HS] 1 2018
但是我也想从“计数”中打印出信息。这是我的预期结果
all_class count year doc_mun
0 [13, BG] 2 2018 DOC0134 | DOC0256
1 [13, HL] 1 2018 DOC0256
2 [13, HS] 1 2018 DOC0134
3 [24, BG] 1 2018 DOC0134
4 [24, HL] 0 2018
5 [24, HS] 1 2018 DOC0134
6 [25, BG] 1 2018 DOC0256
7 [25, HL] 1 2018 DOC0256
8 [25, HS] 0 2018
9 [31, BG] 1 2018 DOC0134
10 [31, HL] 0 2018
11 [31, HS] 1 2018
提前谢谢
答案 0 :(得分:1)
我认为您可以使用
agg
具有自定义功能,我正在使用stack
和unstack
来完成子功能
doc_number=lambda x : '|'.join(x)
doc_number.__name__='doc_number'
df.groupby(['year','class_a','class_b']).doc_num.agg(['count',doc_number]).\
unstack(1,fill_value=0).\
stack().\
reset_index()
Out[451]:
year class_b class_a count doc_number
0 2018 13 BG 2 DOC0134|DOC0256
1 2018 13 HL 1 DOC0256
2 2018 13 HS 1 DOC0134
3 2018 24 BG 1 DOC0134
4 2018 24 HL 0 0
5 2018 24 HS 1 DOC0134
6 2018 25 BG 1 DOC0256
7 2018 25 HL 1 DOC0256
8 2018 25 HS 0 0
9 2018 31 BG 1 DOC0134
10 2018 31 HL 0 0
11 2018 31 HS 1 DOC0134