我有一个数据透视表数据框,其中的列包含日期范围和位置的原因计数代码。
理想情况下,我想在每个位置下插入一行,该位置的%年龄总计为特定原因代码所代表的数量。
因此,如果列MS = 6且该行的总数为52,则直接位于下方的行中的该列将显示为11.5%。
如果更有意义的话,我也可以把它作为第二栏 这是我目前使用的代码
issue_query = """
select distinct int(obhssq), (obwhid), obrtrc, SUBSTRING(int(obivdt), 3, 2) as inv_month, ((SUBSTRING(int(obivdt), 3, 2) - 1) / 3 + 1) as inv_quarter,
int(obivdt)
from hsdet where obrtrc != '' and obrtrc != 'TX' and obivdt > 170000 and obivdt < 990000 and obwhid in ('01', '03', '05', '06', '07', '08', '09', '11', '12')
"""
cursor.execute(issue_query)
total_issues = 0
hedrows = cursor.fetchall()
for row in hedrows :
total_issues = total_issues + 1
issue_df = pd.read_sql(issue_query, cnxn)
issue_df.rename(columns={'00001' : 'Invoices', 'OBWHID' : 'Warehouse', 'OBRTRC':'Reason', 'INV_MONTH':'Month', 'INV_QUARTER':'Quarter', '00006':'Date'}, inplace=True)
pivoted = pd.pivot_table(issue_df, index=["Warehouse", "Quarter"], values=["Invoices"], columns=['Reason'], aggfunc='count', fill_value=0)
pivoted['Total']= pivoted.sum(axis=1)
pivoted.loc['Total'] = pivoted.sum()
print(pivoted)
这是我目前的输出:
Reason CE CS DG DR IC IO IP IW LC LD NC NO PB QC QW SC WH TTL
(01, 1) 9 4 4 0 1 8 7 5 0 0 17 5 2 2 2 2 0 68
(03, 1) 14 3 1 0 1 3 2 2 0 0 7 9 10 0 0 2 1 55
(05, 1) 4 2 1 0 3 1 5 1 1 0 4 1 0 1 2 1 0 27
(06, 1) 11 1 0 0 0 0 0 2 0 0 2 2 2 0 0 0 0 20
(07, 1) 0 5 0 0 0 4 1 0 0 0 1 1 0 0 0 0 0 12
(08, 1) 3 2 1 1 0 4 2 0 1 0 3 2 8 0 0 1 0 28
(09, 1) 6 1 0 1 0 0 0 0 0 0 2 0 2 0 0 1 0 13
(11, 1) 0 0 6 0 2 2 8 1 0 0 4 4 0 1 11 0 0 39
(12, 1) 10 3 1 0 0 1 9 0 0 1 2 6 0 0 0 0 0 33
Total 57 21 14 2 7 23 34 11 2 1 42 30 24 4 15 7 1 295
我想按如下方式插入行:
Invoices \
Reason CE CS DG DR IC IO IP IW LC LD NC NO PB QC QW SC WH TTL
(01, 1) 9 4 4 0 1 8 7 5 0 0 17 5 2 2 2 2 0 68
%age 13% 6% 6% 0% 1% 12% 10% 7% 0% 0% 25% 7% 3% 3% 3% 3% 0% 23%
谢谢!
答案 0 :(得分:1)
不确定这是最优雅的解决方案,但它确实有效:
输入:
Reason CE CS
0 (01,1) 1 3
1 (02,1) 4 1
2 (03,1) 3 7
3 (04,1) 2 5
4 (05,1) 0 4
5 total 10 20
这是代码:
def calc_percent(x, total_values):
x_array = x.values
x_values = x_array[1:]
new_x = [x_array[0]] + [str(100.0*x_values[i]/total_values[i]) +"%" for i in range(len(x_values))]
return np.asarray(new_x)
total_row = df.iloc[-1,:]
total_values = total_row.values[1:]
rest_df = df.iloc[:-1, :]
new_df = rest_df.apply(lambda x: calc_percent(x,total_values),axis=1)
final_df = rest_df.append(new_df).sort_values("Reason")
final_df.loc[final_df.shape[0]] = total_row
输出:
Reason CE CS
0 (01,1) 1 3
0 (01,1) 10.0% 15.0%
1 (02,1) 4 1
1 (02,1) 40.0% 5.0%
2 (03,1) 3 7
2 (03,1) 30.0% 35.0%
3 (04,1) 2 5
3 (04,1) 20.0% 25.0%
4 (05,1) 0 4
4 (05,1) 0.0% 20.0%
10 total 10 20