对Pandas Dataframe中加权平均变量的有效评估

时间:2018-05-19 03:49:45

标签: python-3.x pandas dataframe pandas-groupby

请考虑下面生成的数据框df:

import pandas as pd

def creatingDataFrame():

    raw_data = {'code': [1, 2, 3, 2 , 3, 3],                
                'var1': [10, 20, 30, 20 , 30, 30],
                'var2': [2,4,6,4,6,6],
                'price': [20, 30, 40 , 50, 10, 20],
                'sells': [3, 4 , 5, 1, 2, 3]}
    df = pd.DataFrame(raw_data, columns = ['code', 'var1','var2', 'price', 'sells'])
    return df


if __name__=="__main__":

    df=creatingDataFrame()

    setCode=set(df['code'])


    listDF=[]
    for code in setCode:
        dfCode=df[df['code'] == code].copy()
        print(dfCode)
        lenDfCode=len(dfCode)
        if(lenDfCode==1):
            theData={'code': [dfCode['code'].iloc[0]],                
                'var1': [dfCode['var1'].iloc[0]],
                'var2': [dfCode['var2'].iloc[0]],
                'averagePrice': [dfCode['price'].iloc[0]],
                'totalSells': [dfCode['sells'].iloc[0]]
            }
        else:
            dfCode['price*sells']=dfCode['price']*dfCode['sells']
            sumSells=np.sum(dfCode['sells'])
            sumProducts=np.sum(dfCode['price*sells'])
            dfCode['totalSells']=sumSells
            av=sumProducts/sumSells
            dfCode['averagePrice']=av
            theData={'code': [dfCode['code'].iloc[0]],                
                'var1': [dfCode['var1'].iloc[0]],
                'var2': [dfCode['var2'].iloc[0]],
                'averagePrice': [dfCode['averagePrice'].iloc[0]],
                'totalSells': [dfCode['totalSells'].iloc[0]]
            }
        dfPart=pd.DataFrame(theData, columns = ['code', 'var1','var2', 'averagePrice','totalSells'])
        listDF.append(dfPart)
    newDF = pd.concat(listDF)
    print(newDF)

我有这个数据框

   code  var1  var2  price  sells
0     1    10     2     20      3
1     2    20     4     30      4
2     3    30     6     40      5
3     2    20     4     50      1
4     3    30     6     10      2
5     3    30     6     20      3

我想生成以下数据框:

   code  var1  var2  averagePrice  totalSells
0     1    10     2          20.0           3
0     2    20     4          34.0           5
0     3    30     6          28.0          10

请注意,此数据框是通过评估每个代码的平均价格和总销售额来创建的。此外,var1和var2对于每个代码都是相同的。上面的python代码可以做到这一点,但我知道它效率低下。我相信可以使用groupby完成所需的解决方案,但我无法生成它。

1 个答案:

答案 0 :(得分:2)

情况不同,applypd.Series

df.groupby(['code','var1','var2']).apply(lambda x : pd.Series({'averagePrice': sum(x['sells']*x['price'])/sum(x['sells']),'totalSells':sum(x['sells'])})).reset_index()
Out[366]: 
   code  var1  var2  averagePrice  totalSells
0     1    10     2          20.0         3.0
1     2    20     4          34.0         5.0
2     3    30     6          28.0        10.0