数字列名称和数字行索引的熊猫数据透视表加权平均值

时间:2019-05-27 02:39:33

标签: python-3.x pandas pivot-table

如何计算“锅”和“移植物”每个轴的“加权平均轴”值,并由构成数据透视表腹部的“ rez”值(求和)加权?在这种情况下,行索引和列名称不是字符串,而是数字。

import pandas as pd
import numpy as np
import random

# CREATE DATA
pot = np.linspace(0,80,num=9)
graft = np.linspace(0.30,0.80,num=11)

pots = pd.Series([])
grafts = pd.Series([])
rezs = pd.Series([])

for i in range (1,5):
    for pott in pot:
        for graftt in graft:
            rez = random.uniform(-2,4)
            rez= round(rez, 2)
            pots = pots.append(pd.Series(pott))
            grafts = grafts.append(pd.Series(graftt))
            rezs = rezs.append(pd.Series(rez))

# CREATE DATA FRAME
df = pd.DataFrame({'pot' : pots, 'graft' : grafts, 'rez': rezs })

# CREATE PIVOT TABLE
pivot = pd.pivot_table(data = df, values = 'rez', index = ['pot'], columns = ['graft'], aggfunc = np.sum)

# REPLACE NEGATIVE VALUES WITH ZEROES
pivot[pivot < 0] = 0

# Calculate Weights
weights = pivot / (pivot.sum().sum())
pivotsum = pivot.sum().sum()

数据透视表:

graft  0.30   0.35  0.40  0.45   0.50   0.55  0.60  0.65  0.70  0.75   0.80
pot                                                                        
0.0    5.66   0.37  3.92  4.99   1.38   7.89  7.04  3.83  3.88  5.48   4.08
10.0   5.44   4.34  2.26  4.55   4.89   4.50  2.07  3.94  2.66  3.77  11.26
20.0   0.00   4.47  7.15  1.20   4.55  11.40  3.33  0.41  2.34  4.20   7.17
30.0   2.88   1.88  8.55  4.60   4.07  13.58  6.06  3.79  9.25  4.21   1.63
40.0   5.06   0.63  4.20  6.68   5.15   7.93  2.03  7.92  6.94  0.00   1.99
50.0   6.57  11.63  3.80  6.69   6.74   5.71  3.48  4.48  0.00  3.20   4.18
60.0   6.46   5.69  5.72  0.00  13.71   5.03  3.82  9.91  4.02  1.12   1.81
70.0   0.82   8.50  4.79  3.82   1.50   5.66  2.57  0.00  6.91  6.12   4.55
80.0   8.90   4.58  5.01  3.47   4.42   0.08  4.63  0.00  2.77  0.96   3.30

我想计算加权平均值“ graft”和“ pot”,并用构成数据帧腹部的(求和)“ rez”值加权。

我已经尽力创建权重,但是不知道如何访问索引和列名来计算加权平均值。

实际数据将不会均匀分布。

所需的输出:

加权平均嫁接数=(0.30 * 5.66 /枢轴)+(0.30 * 5.44 /枢轴)+ ... +(0.80 * 3.30 /数据透视)

加权平均底池=(0 * 5.66 /枢轴)+(10 * 5.44 /枢轴)+ ... +(80 * 3.30 /枢轴)

这两个加权平均值一起在数据透视表中描述了一个“点”,该点表示数据透视表将在哪里平衡。

1 个答案:

答案 0 :(得分:0)

Pandas.dataframe.reset_index和pandas.melt工作。我们可以消除原始代码中的最后两行。我们折叠数据透视表的多索引,然后融化剩余的数据框列,同时保留一个作为融化方法的id变量。然后,我们可以在完全融化的数据透视表上计算加权平均值。可能有更直接的解决方案,但这确实可以解决问题。

import pandas as pd
import numpy as np
import random

# CREATE DATA
pot = np.linspace(0,80,num=9)
graft = np.linspace(0.30,0.80,num=11)

pots = pd.Series([])
grafts = pd.Series([])
rezs = pd.Series([])

for i in range (1,5):
    for pott in pot:
        for graftt in graft:
            rez = random.uniform(-2,4)
            rez= round(rez, 2)
            pots = pots.append(pd.Series(pott))
            grafts = grafts.append(pd.Series(graftt))
            rezs = rezs.append(pd.Series(rez))

# CREATE DATA FRAME
df = pd.DataFrame({'pot' : pots, 'graft' : grafts, 'rez': rezs })

# CREATE PIVOT TABLE
pivot = pd.pivot_table(data = df, values = 'rez', index = ['pot'], columns = ['graft'], aggfunc = np.sum)

# PRINT OUT INITIAL PIVOT TABLE
print("\nOriginal Pivot:\n",pivot)

# REPLACE NEGATIVE VALUES WITH ZEROES
pivot[pivot < 0] = 0
print("\nPost negative filtered Pivot:\n",pivot)

# RESET INDEX ON PIVOT TABLE
unpack = pivot.reset_index()

# MELT DATAFRAME COLUMNS
unpack = pd.melt(unpack, id_vars=['pot'], value_name='Rez')

# CALCULATE AND DISPLAY WEIGHTED AVERAGES
totrez = unpack['Rez'].sum()
unpack['Weight'] = unpack['Rez'] / totrez
wavpot = (unpack['pot'] * unpack['Weight']).sum()
wavpot = round(wavpot, 2)
print("\nWeighted Average pot:", wavpot)
wavgraft = (unpack['graft'] * unpack['Weight']).sum()
wavgraft = round(wavgraft, 2)
print("Weighted Average graft:", wavgraft)