我正在将SAS转换为python,并遇到此代码,其中我不匹配确切的值。 SAS表示采用关联列和pwgtp列的加权平均值。但是尝试使用不匹配的python值。
client
答案是SAS的0.2871426408
我尝试了各种方法来获取体重平均值。 数据包含120万行 抱歉无法共享数据
ro.client
答案是0.26806426594942845
proc means data=hhhead1 nway noprint;
weight pwgtp;
var associates;
output out=propassociates (drop=_:) mean=; run;
答案是0.08837267780237641
propassociates = hhhead1.groupby(by = ['PWGTP_y'])['associates'].mean().reset_index()
np.mean(propassociates['associates'])
答案是0.26806426594942845
hhhead1['weight_sum'] = hhhead1['associates'] * hhhead1['PWGTP_x']
propassociates = hhhead1['weight_sum'].sum() / hhhead1['PWGTP_x'].sum()
答案是0.08837267780237641
答案是SAS的0.2871426408
答案是0.26806426594942845
答案 0 :(得分:0)
def wavg(group, avg_name, weight_name):
d = group[avg_name]
w = group[weight_name]
try:
return (d * w).sum() / w.sum()
except ZeroDivisionError:
return d.mean()
a=data1.groupby(['GroupByVar']).apply(wavg, "yourVar", "WeightVar")
这应该有效