我可以使用pandas中的pivot_table实现我的Desired Output(如下所示)或类似的以下数据集。我正在尝试做类似的事情:
pivot_table(df, rows=['region'], cols=['area','distributor','salesrep'],
aggfunc=np.sum, margins=True).stack(['area','distributor','salesrep'])
但我只是在每个区域获得小计,如果我将区域从cols移动到行,那么我将只获得每个区域的小计。
数据集:
region area distributor salesrep sales invoice_count Central Butterworth HIN MARKETING TLS 500 25 Central Butterworth HIN MARKETING TLS 500 25 Central Butterworth HIN MARKETING OSE 500 25 Central Butterworth HIN MARKETING OSE 500 25 Central Butterworth KWANG HENGG TCS 500 25 Central Butterworth KWANG HENGG TCS 500 25 Central Butterworth KWANG HENG LBH 500 25 Central Butterworth KWANG HENG LBH 500 25 Central Ipoh SGH EDERAN CHAN 500 25 Central Ipoh SGH EDERAN CHAN 500 25 Central Ipoh SGH EDERAN KAMACHI 500 25 Central Ipoh SGH EDERAN KAMACHI 500 25 Central Ipoh CORE SYN LILIAN 500 25 Central Ipoh CORE SYN LILIAN 500 25 Central Ipoh CORE SYN TEOH 500 25 Central Ipoh CORE SYN TEOH 500 25 East JB LEI WAH NF05 500 25 East JB LEI WAH NF05 500 25 East JB LEI WAH NF06 500 25 East JB LEI WAH NF06 500 25 East JB WONDER F&B SEREN 500 25 East JB WONDER F&B SEREN 500 25 East JB WONDER F&B MONC 500 25 East JB WONDER F&B MONC 500 25 East PJ PENGEDAR NORM 500 25 East PJ PENGEDAR NORM 500 25 East PJ PENGEDAR SIMON 500 25 East PJ PENGEDAR SIMON 500 25 East PJ HEBAT OGI 500 25 East PJ HEBAT OGI 500 25 East PJ HEBAT MIGI 500 25 East PJ HEBAT MIGI 500 25
期望的输出:
region area distributor salesrep invoice_count sales Grand Total 800 16000 Central Central Total 400 8000 Central Butterworth Butterworth Total 200 4000 Central Butterworth HIN MARKETING HIN MARKETING Total 100 2000 Central Butterworth HIN MARKETING OSE 50 1000 Central Butterworth HIN MARKETING TLS 50 1000 Central Butterworth KWANG HENG KWANG HENG Total 100 2000 Central Butterworth KWANG HENG LBH 50 1000 Central Butterworth KWANG HENG TCS 50 1000 Central Ipoh Ipoh Total 200 4000 Central Ipoh CORE SYN CORE SYN Total 100 2000 Central Ipoh CORE SYN LILIAN 50 1000 Central Ipoh CORE SYN TEOH 50 1000 Central Ipoh SGH EDERAN SGH EDERAN Total 100 2000 Central Ipoh SGH EDERAN CHAN 50 1000 Central Ipoh SGH EDERAN KAMACHI 50 1000 East East Total 400 8000 East JB JB Total 200 4000 East JB LEI WAH LEI WAH Total 100 2000 East JB LEI WAH NF05 50 1000 East JB LEI WAH NF06 50 1000 East JB WONDER F&B WONDER F&B Total 100 2000 East JB WONDER F&B MONC 50 1000 East JB WONDER F&B SEREN 50 1000 East PJ PJ Total 200 4000 East PJ HEBAT HEBAT Total 100 2000 East PJ HEBAT MIGI 50 1000 East PJ HEBAT OGI 50 1000 East PJ PENGEDAR PENDEGAR Total 100 2000 East PJ PENGEDAR NORM 50 1000 East PJ PENGEDAR SIMON 50 1000
答案 0 :(得分:1)
我们可以使用groupby
代替pivot_table
:
import numpy as np
import pandas as pd
def label(ser):
return '{s} Total'.format(s=ser)
filename = 'data.txt'
df = pd.read_table(filename, delimiter='\t')
total = pd.DataFrame({'region': ['Grand Total'],
'invoice_count': df['invoice_count'].sum(),
'sales': df['sales'].sum()})
total['total_rank'] = 1
region_total = df.groupby(['region'], as_index=False).sum()
region_total['area'] = region_total['region'].apply(label)
region_total['region_rank'] = 1
area_total = df.groupby(['region', 'area'], as_index=False).sum()
area_total['distributor'] = area_total['area'].apply(label)
area_total['area_rank'] = 1
dist_total = df.groupby(
['region', 'area', 'distributor'], as_index=False).sum()
dist_total['salesrep'] = dist_total['distributor'].apply(label)
rep_total = df.groupby(
['region', 'area', 'distributor', 'salesrep'], as_index=False).sum()
# UNION the DataFrames into one DataFrame
result = pd.concat([total, region_total, area_total, dist_total, rep_total])
# Replace NaNs with empty strings
result.fillna({'region': '', 'area': '', 'distributor': '', 'salesrep':
''}, inplace=True)
# Reorder the rows
sorter = np.lexsort((
result['distributor'].rank(),
result['area_rank'].rank(),
result['area'].rank(),
result['region_rank'].rank(),
result['region'].rank(),
result['total_rank'].rank()))
result = result.take(sorter)
result = result.reindex(
columns=['region', 'area', 'distributor', 'salesrep', 'invoice_count', 'sales'])
print(result.to_string(index=False))
产量
region area distributor salesrep invoice_count sales
Grand Total 800 16000
Central Central Total 400 8000
Central Butterworth Butterworth Total 200 4000
Central Butterworth HIN MARKETING HIN MARKETING Total 100 2000
Central Butterworth HIN MARKETING OSE 50 1000
Central Butterworth HIN MARKETING TLS 50 1000
Central Butterworth KWANG HENG KWANG HENG Total 100 2000
Central Butterworth KWANG HENG LBH 50 1000
Central Butterworth KWANG HENG TCS 50 1000
Central Ipoh Ipoh Total 200 4000
Central Ipoh CORE SYN CORE SYN Total 100 2000
Central Ipoh CORE SYN LILIAN 50 1000
Central Ipoh CORE SYN TEOH 50 1000
Central Ipoh SGH EDERAN SGH EDERAN Total 100 2000
Central Ipoh SGH EDERAN CHAN 50 1000
Central Ipoh SGH EDERAN KAMACHI 50 1000
East East Total 400 8000
East JB JB Total 200 4000
East JB LEI WAH LEI WAH Total 100 2000
East JB LEI WAH NF05 50 1000
East JB LEI WAH NF06 50 1000
East JB WONDER F&B WONDER F&B Total 100 2000
East JB WONDER F&B MONC 50 1000
East JB WONDER F&B SEREN 50 1000
East PJ PJ Total 200 4000
East PJ HEBAT HEBAT Total 100 2000
East PJ HEBAT MIGI 50 1000
East PJ HEBAT OGI 50 1000
East PJ PENGEDAR PENGEDAR Total 100 2000
East PJ PENGEDAR NORM 50 1000
East PJ PENGEDAR SIMON 50 1000
答案 1 :(得分:0)
我不知道如何在表格中获取小计,但如果你运行
df.pivot_table(rows=['region','area','distributor','salesrep'],
aggfunc=np.sum, margins=True)
你会得到
invoice_count sales
region area distributor salesrep
Central Butterworth HIN MARKETING OSE 50 1000
TLS 50 1000
KWANG HENG LBH 50 1000
KWANG HENGG TCS 50 1000
Ipoh CORE SYN LILIAN 50 1000
TEOH 50 1000
SGH EDERAN CHAN 50 1000
KAMACHI 50 1000
East JB LEI WAH NF05 50 1000
NF06 50 1000
WONDER F&B MONC 50 1000
SEREN 50 1000
PJ HEBAT MIGI 50 1000
OGI 50 1000
PENGEDAR NORM 50 1000
SIMON 50 1000
All 800 16000
如果您想要基于说region
和area
的总计,则可以运行
df.pivot_table(rows=['region', 'area'], aggfunc=np.sum, margins=True)
导致
invoice_count sales
region area
Central Butterworth 200 4000
Ipoh 200 4000
East JB 200 4000
PJ 200 4000
All 800 16000