我对Python和Pandas相当陌生。在Google和StackOverflow的帮助下,我可以获得了大部分帮助。但是,这让我难过。我有一个看起来像这样的数据框:
SalesPerson 1 SalesPerson 2 SalesPerson 3
Revenue Number of Orders Revenue Number of Orders Revenue Number of Orders
In Process Stage 1 8347 8 9941 5 5105 7
In Process Stage 2 3879 2 3712 3 1350 10
In Process Stage 3 7885 4 6513 8 2218 2
Won Not Invoiced 4369 1 1736 5 4950 9
Won Invoiced 7169 5 5308 3 9832 2
Lost to Competitor 8780 1 3836 7 2851 3
Lost to No Action 2835 5 4653 1 1270 2
我想为“处理中”,“赢得”和“丢失”添加小计行,以使我的数据看起来像这样:
SalesPerson 1 SalesPerson 2 SalesPerson 3
Revenue Number of Orders Revenue Number of Orders Revenue Number of Orders
In Process Stage 1 8347 8 9941 5 5105 7
In Process Stage 2 3879 2 3712 3 1350 10
In Process Stage 3 7885 4 6513 8 2218 2
In Process Subtotal 20111 14 20166 16 8673 19
Won Not Invoiced 4369 1 1736 5 4950 9
Won Invoiced 7169 5 5308 3 9832 2
Won Subtotal 11538 6 7044 8 14782 11
Won Percent 27% 23% 20% 25% 54% 31%
Lost to Competitor 8780 1 3836 7 2851 3
Lost to No Action 2835 5 4653 1 1270 2
Lost Subtotal 11615 6 8489 8 4121 5
Lost Percent 27% 23% 24% 25% 15% 14%
Total 43264 26 35699 32 27576 35
到目前为止,我的代码如下:
def create_win_lose_table(dataframe):
in_process_stagename_list = {'In Process Stage 1', 'In Process Stage 2', 'In Process Stage 3'}
won_stagename_list = {'Won Invoiced', 'Won Not Invoiced'}
lost_stagename_list = {'Lost to Competitor', 'Lost to No Action'}
temp_Pipeline_df = dataframe.copy()
for index, row in temp_Pipeline_df.iterrows():
if index not in in_process_stagename_list:
temp_Pipeline_df.drop([index], inplace = True)
Pipeline_sum = temp_Pipeline_df.sum()
#at the end I was going to concat the sum to the original dataframe, but that's where I'm stuck
我才刚刚开始处理过程中的数据框。我的想法是,一旦弄清了这一点,我便可以为“成功与失败”类别重复该过程。欢迎任何想法或方法。
谢谢! 乔恩
答案 0 :(得分:1)
适合您的简单示例。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(5, 5))
df_total = pd.DataFrame(np.sum(df.iloc[:, :].values, axis=0)).transpose()
df_with_totals = df.append(df_total)
df_with_totals
0 1 2 3 4
0 0.743746 0.668769 0.894739 0.947641 0.753029
1 0.236587 0.862352 0.329624 0.637625 0.288876
2 0.817637 0.250593 0.363517 0.572789 0.785234
3 0.140941 0.221746 0.673470 0.792831 0.170667
4 0.965435 0.836577 0.790037 0.996000 0.229857
0 2.904346 2.840037 3.051388 3.946885 2.227662
您可以在Pandas中使用重命名参数随意调用摘要行。