我正在尝试使用每个多指标的总销售额来计算销售额百分比。 我的DataFrame是这样的:
local categoria fabricante tipo consistencia peso pacote ordem vendas_kg
AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 10
AREA I SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 20
AREA I SABAO ASATP CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 20
AREA I SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 30
AREA I SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 2 20
AREA I SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 2 30
AREA I SABAO ASATP CAPSULA LIQUIDO 1501 A 2000g PLASTICO 2 20
AREA I SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 2 30
AREA II SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 10
AREA II SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 1 15
AREA II SABAO ASATP CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 25
AREA II SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 1 35
AREA II SABAO ASATP DILUIDO LIQUIDO 1501 A 2000g PLASTICO 2 20
AREA II SABAO TEPOS DILUIDO LIQUIDO 1501 A 2000g PLASTICO 2 25
AREA II SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 2 20
AREA II SABAO TEPOS CAPSULA LIQUIDO 1501 A 2000g PLASTICO 2 30
因此,我正在计算索引中每个唯一元组的总销售额,并将其存储为总数据框。我的目标是计算每个[fabricante]市场份额,但现在我的目标是TEPOS。 旋转我的DataFrame之后,像这样:
sum sum
vendas_kg vendas_kg
fabricante ASATP TEPOS Total
local tipo ordem
AREA I DILUIDO 1 10 20 30
2 20 30 50
CAPSULA 1 10 20 30
2 20 30 50
AREA II DILUIDO 1 10 15 25
2 20 25 45
CAPSULA 1 25 35 55
2 20 30 50
我用于计算总数和使用multiindex创建数据框的代码是:
#creating a sample from all data
a = df.sample(n=50)
#creating a multiindex dataframe
temp_df = pd.pivot_table(a.fillna(value=0), index=['tipo','local','pacote'],columns=['fabricante'], values=['vendas_kg'], fill_value=0, aggfunc=[np.sum])
total = temp_df.sum(level=1, axis=1)
#calculating the marketshare for Tepos
temp_df[('sum','vendas_kg','TEPOS')] = temp_df[('sum','vendas_kg','TEPOS')] / temp_df.sum(level=1, axis=1)
有两件事发生了,如果我使用所有列,那么如果我使用上面的代码,所有数据将变为NaN。
ValueError: cannot join with no level specified and no overlapping names
我的目标是拥有这样的东西:
sum sum
vendas_kg vendas_kg
fabricante ASATP TEPOS % segment Total
local tipo ordem
AREA I DILUIDO 1 33% 66% 50% 30
2 40% 60% 50% 50
CAPSULA 1 33% 66% 50% 30
2 40% 60% 50% 50
AREA II DILUIDO 1 40% 60% 31.25% 25
2 44.44% 55.56% 47.37% 45
CAPSULA 1 43.64% 57.36% 53.63% 55
2 40% 60% 53.63% 50
有人可以帮助我吗? 在Percentage calculation in pivot table pandas with columns
上找到有关数据和目标的更多信息。答案 0 :(得分:1)
尝试一下:
df_percent = temp_df.apply(lambda x: round(x / x.sum() * 100, 2), axis = 1)
sum
vendas_kg
fabricante ASATP TEPOS
tipo local pacote
CAPSULA AREA I PLASTICO 40.00 60.00
AREA II PLASTICO 22.73 77.27
DILUIDO AREA I PLASTICO 37.50 62.50
AREA II PLASTICO 42.86 57.14
要添加total
列,请执行以下操作:
df_percent['total'] = total
说明
apply
等价于一个循环,并且 axis 命令告诉应用程序他将按列滚动。代码要做的只是取每一行的值,然后除以整行的总和。在添加total