按因子水平计算R中的AVERAGEIF

时间:2015-06-29 17:07:20

标签: r excel

Calculating percentiles by factor using ave() in r中,我询问了如何计算ave()函数中的百分位数。完成这项任务后,我面临着一项更艰巨的任务。

获取以下数据:

DistrictName            Building Name   X2.Yr.AVG       Thirty          Seventy
Ionia Public Schools    Emerson         -0.337464323    -0.196387489    -0.046524185
Ionia Public Schools    Jefferson       -0.318673587    -0.196387489    -0.046524185
Ionia Public Schools    Ionia Middle    -0.290854669    -0.196387489    -0.046524185
Ionia Public Schools    Ionia Middle    -0.288202752    -0.196387489    -0.046524185
Ionia Public Schools    Twin Rivers El  -0.23426755     -0.196387489    -0.046524185
Ionia Public Schools    R.B. Boyce El   -0.202319963    -0.196387489    -0.046524185
Ionia Public Schools    Twin Rivers El  -0.142995221    -0.196387489    -0.046524185
Ionia Public Schools    Emerson         -0.141620372    -0.196387489    -0.046524185
Ionia Public Schools    Jefferson       -0.141407078    -0.196387489    -0.046524185
Ionia Public Schools    R.B. Boyce El   -0.115530249    -0.196387489    -0.046524185
Ionia Public Schools    Ionia Middle    -0.111449269    -0.196387489    -0.046524185
Ionia Public Schools    Twin Rivers El  -0.054918339    -0.196387489    -0.046524185
Ionia Public Schools    Jefferson       -0.045591501    -0.196387489    -0.046524185
Ionia Public Schools    A.A. Rather     0.002251298     -0.196387489    -0.046524185
Ionia Public Schools    R.B. Boyce El   0.020669633     -0.196387489    -0.046524185
Ionia Public Schools    Emerson         0.065064968     -0.196387489    -0.046524185
Ionia Public Schools    A.A. Rather     0.182776319     -0.196387489    -0.046524185

我尝试做的事情类似于Excel中的AVERAGEIF功能。在Excel中,我可以说=AVERAGEIF(C2:C18, "<-.196387489"),它吐出平均值-0.278630474。我需要在R中允许我执行以下操作:我想为平均值创建新变量: 1)X2.Yr.AVG的任何值小于Thirty的值 2)任何大于Seventy

值的值

问题在于我需要能够在722级别的大型数据框中执行此操作,因为DistrictName因子。在计算百分位数的步骤中,我使用ave()函数根据所需因子创建百分位数,如下所示:

    MATHgap$Thirty<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName, 
       FUN= function(x) quantile(x, 0.3))

    MATHgap$Seventy<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName, 
       FUN= function(x) quantile(x, 0.7))

有没有办法在ave()内执行类似于AVERAGEIF的操作,以便对DistrictName的每个值重复操作,而不是其他值?即,Ionia公立学校的X2.Yr.AVG平均值应小于-0.196387489,X2.Yr.AVG大于-0.046524185,我希望能够使用各自的值为所有地区执行相同的功能适用于X2.Yr.AVGThirtySeventy

如果这令人困惑,请道歉。

1 个答案:

答案 0 :(得分:1)

以下是使用dplyr

的解决方案
MATHgap %>%
  group_by(DistrictName) %>%
  mutate(MeanLT30 = mean(X2.Yr.AVG[X2.Yr.AVG < Thirty]),
    MeantGT70 = mean(X2.Yr.AVG[X2.Yr.AVG > Seventy]))