我已经编写了一些代码来计算使用pandas中的数据透视表的加权平均值。但是,我不确定如何添加执行加权平均的实际列(添加新列,其中每行包含'cumulative'/'COUNT'的值)。
数据如下:
VALUE COUNT GRID agb
1 43 1476 1051
2 212 1476 2983
5 7 1477 890
4 1361 1477 2310
这是我的代码:
# Read input data
lup_df = pandas.DataFrame.from_csv(o_dir+LUP+'.csv',index_col=False)
# Insert a new column with area * variable
lup_df['cumulative'] = lup_df['COUNT']*lup_df['agb']
# Create and output pivot table
lup_pvt = pandas.pivot_table(lup_df, 'agb', rows=['GRID'])
# TODO: Add a new column where each row contains value of 'cumulative'/'COUNT'
lup_pvt.to_csv(o_dir+PIVOT+'.csv',index=True,header=True,sep=',')
我该怎么做?
答案 0 :(得分:9)
因此,对于grid
的每个值,您希望agb
列的加权平均值,其中权重是count
列中的值。如果这种解释是正确的,我认为这可以解决groupby
:
import numpy as np
import pandas as pd
np.random.seed(0)
n = 50
df = pd.DataFrame({'count': np.random.choice(np.arange(10)+1, n),
'grid': np.random.choice(np.arange(10)+50, n),
'value': np.random.randn(n) + 12})
df['prod'] = df['count'] * df['value']
grouped = df.groupby('grid').sum()
grouped['wtdavg'] = grouped['prod'] / grouped['count']
print grouped
count value prod wtdavg
grid
50 22 57.177042 243.814417 11.082474
51 27 58.801386 318.644085 11.801633
52 11 34.202619 135.127942 12.284358
53 24 59.340084 272.836636 11.368193
54 39 137.268317 482.954857 12.383458
55 47 79.468986 531.122652 11.300482
56 17 38.624369 214.188938 12.599349
57 22 38.572429 279.948202 12.724918
58 27 36.492929 327.315518 12.122797
59 34 60.851671 408.306429 12.009013
或者,如果你想要有点光滑并写一个加权平均函数,你可以反复使用:
import numpy as np
import pandas as pd
np.random.seed(0)
n = 50
df = pd.DataFrame({'count': np.random.choice(np.arange(10)+1, n),
'grid': np.random.choice(np.arange(10)+50, n),
'value': np.random.randn(n) + 12})
def wavg(val_col_name, wt_col_name):
def inner(group):
return (group[val_col_name] * group[wt_col_name]).sum() / group[wt_col_name].sum()
inner.__name__ = 'wtd_avg'
return inner
slick = df.groupby('grid').apply(wavg('value', 'count'))
print slick
grid
50 11.082474
51 11.801633
52 12.284358
53 11.368193
54 12.383458
55 11.300482
56 12.599349
57 12.724918
58 12.122797
59 12.009013
dtype: float64