在pandas数据框中的每一行下方添加一个计算行

时间:2017-02-02 14:53:06

标签: python pandas

我有一个数据透视表数据框,其中的列包含日期范围和位置的原因计数代码。

理想情况下,我想在每个位置下插入一行,该位置的%年龄总计为特定原因代码所代表的数量。

因此,如果列MS = 6且该行的总数为52,则直接位于下方的行中的该列将显示为11.5%。

如果更有意义的话,我也可以把它作为第二栏 这是我目前使用的代码

issue_query = """
select distinct int(obhssq), (obwhid), obrtrc, SUBSTRING(int(obivdt), 3, 2) as inv_month, ((SUBSTRING(int(obivdt), 3, 2) - 1) / 3 + 1) as inv_quarter,
int(obivdt)
from hsdet where obrtrc != '' and obrtrc != 'TX' and obivdt > 170000 and obivdt < 990000 and obwhid in ('01', '03', '05', '06', '07', '08', '09', '11', '12')

"""
cursor.execute(issue_query)
total_issues = 0
hedrows = cursor.fetchall()
for row in hedrows :
    total_issues = total_issues + 1
issue_df = pd.read_sql(issue_query, cnxn)
issue_df.rename(columns={'00001' : 'Invoices', 'OBWHID' : 'Warehouse', 'OBRTRC':'Reason', 'INV_MONTH':'Month', 'INV_QUARTER':'Quarter', '00006':'Date'}, inplace=True)
pivoted = pd.pivot_table(issue_df, index=["Warehouse", "Quarter"], values=["Invoices"], columns=['Reason'], aggfunc='count', fill_value=0)
pivoted['Total']= pivoted.sum(axis=1)
pivoted.loc['Total'] = pivoted.sum()
print(pivoted)

这是我目前的输出:

Reason  CE  CS  DG  DR  IC  IO  IP  IW  LC  LD  NC  NO  PB  QC  QW  SC  WH  TTL
(01, 1) 9   4   4   0   1   8   7   5   0   0   17  5   2   2   2   2   0   68
(03, 1) 14  3   1   0   1   3   2   2   0   0   7   9   10  0   0   2   1   55
(05, 1) 4   2   1   0   3   1   5   1   1   0   4   1   0   1   2   1   0   27
(06, 1) 11  1   0   0   0   0   0   2   0   0   2   2   2   0   0   0   0   20
(07, 1) 0   5   0   0   0   4   1   0   0   0   1   1   0   0   0   0   0   12
(08, 1) 3   2   1   1   0   4   2   0   1   0   3   2   8   0   0   1   0   28
(09, 1) 6   1   0   1   0   0   0   0   0   0   2   0   2   0   0   1   0   13
(11, 1) 0   0   6   0   2   2   8   1   0   0   4   4   0   1   11  0   0   39
(12, 1) 10  3   1   0   0   1   9   0   0   1   2   6   0   0   0   0   0   33
Total   57  21  14  2   7   23  34  11  2   1   42  30  24  4   15  7   1   295   

我想按如下方式插入行:

    Invoices                                                                \   
Reason  CE  CS  DG  DR  IC  IO  IP  IW  LC  LD  NC  NO  PB  QC  QW  SC  WH  TTL
(01, 1) 9   4   4   0   1   8   7   5   0   0   17  5   2   2   2   2   0   68
%age    13% 6%  6%  0%  1%  12% 10% 7%  0%  0%  25% 7%  3%  3%  3%  3%  0%  23%

谢谢!

1 个答案:

答案 0 :(得分:1)

不确定这是最优雅的解决方案,但它确实有效:

输入:

   Reason  CE  CS
0  (01,1)   1   3
1  (02,1)   4   1
2  (03,1)   3   7
3  (04,1)   2   5
4  (05,1)   0   4
5   total  10  20

这是代码:

def calc_percent(x, total_values):
    x_array = x.values
    x_values = x_array[1:]
    new_x = [x_array[0]] + [str(100.0*x_values[i]/total_values[i]) +"%" for i in range(len(x_values))]
    return np.asarray(new_x)

total_row = df.iloc[-1,:]
total_values = total_row.values[1:]
rest_df = df.iloc[:-1, :]
new_df = rest_df.apply(lambda x: calc_percent(x,total_values),axis=1)
final_df = rest_df.append(new_df).sort_values("Reason")
final_df.loc[final_df.shape[0]] = total_row

输出:

    Reason     CE     CS
0   (01,1)      1      3
0   (01,1)  10.0%  15.0%
1   (02,1)      4      1
1   (02,1)  40.0%   5.0%
2   (03,1)      3      7
2   (03,1)  30.0%  35.0%
3   (04,1)      2      5
3   (04,1)  20.0%  25.0%
4   (05,1)      0      4
4   (05,1)   0.0%  20.0%
10   total     10     20