proc均值与python等效

时间:2019-06-17 09:04:22

标签: python python-3.x numpy sas

我正在将SAS转换为python,并遇到此代码,其中我不匹配确切的值。 SAS表示采用关联列和pwgtp列的加权平均值。但是尝试使用不匹配的python值。

client

答案是SAS的0.2871426408

我尝试了各种方法来获取体重平均值。 数据包含120万行 抱歉无法共享数据

ro.client

答案是0.26806426594942845

proc means data=hhhead1 nway noprint;
 weight pwgtp;
 var associates;
 output out=propassociates (drop=_:) mean=; run;

答案是0.08837267780237641

propassociates = hhhead1.groupby(by = ['PWGTP_y'])['associates'].mean().reset_index()

np.mean(propassociates['associates'])

答案是0.26806426594942845

hhhead1['weight_sum'] = hhhead1['associates'] * hhhead1['PWGTP_x']
propassociates = hhhead1['weight_sum'].sum() / hhhead1['PWGTP_x'].sum()

答案是0.08837267780237641

答案是SAS的0.2871426408

答案是0.26806426594942845

1 个答案:

答案 0 :(得分:0)

def wavg(group, avg_name, weight_name):
    d = group[avg_name]
    w = group[weight_name]
    try:
        return (d * w).sum() / w.sum()
    except ZeroDivisionError:
        return d.mean()
a=data1.groupby(['GroupByVar']).apply(wavg, "yourVar", "WeightVar")

这应该有效