在Calculating percentiles by factor using ave() in r中,我询问了如何计算ave()
函数中的百分位数。完成这项任务后,我面临着一项更艰巨的任务。
获取以下数据:
DistrictName Building Name X2.Yr.AVG Thirty Seventy
Ionia Public Schools Emerson -0.337464323 -0.196387489 -0.046524185
Ionia Public Schools Jefferson -0.318673587 -0.196387489 -0.046524185
Ionia Public Schools Ionia Middle -0.290854669 -0.196387489 -0.046524185
Ionia Public Schools Ionia Middle -0.288202752 -0.196387489 -0.046524185
Ionia Public Schools Twin Rivers El -0.23426755 -0.196387489 -0.046524185
Ionia Public Schools R.B. Boyce El -0.202319963 -0.196387489 -0.046524185
Ionia Public Schools Twin Rivers El -0.142995221 -0.196387489 -0.046524185
Ionia Public Schools Emerson -0.141620372 -0.196387489 -0.046524185
Ionia Public Schools Jefferson -0.141407078 -0.196387489 -0.046524185
Ionia Public Schools R.B. Boyce El -0.115530249 -0.196387489 -0.046524185
Ionia Public Schools Ionia Middle -0.111449269 -0.196387489 -0.046524185
Ionia Public Schools Twin Rivers El -0.054918339 -0.196387489 -0.046524185
Ionia Public Schools Jefferson -0.045591501 -0.196387489 -0.046524185
Ionia Public Schools A.A. Rather 0.002251298 -0.196387489 -0.046524185
Ionia Public Schools R.B. Boyce El 0.020669633 -0.196387489 -0.046524185
Ionia Public Schools Emerson 0.065064968 -0.196387489 -0.046524185
Ionia Public Schools A.A. Rather 0.182776319 -0.196387489 -0.046524185
我尝试做的事情类似于Excel中的AVERAGEIF
功能。在Excel中,我可以说=AVERAGEIF(C2:C18, "<-.196387489")
,它吐出平均值-0.278630474。我需要在R中允许我执行以下操作:我想为平均值创建新变量:
1)X2.Yr.AVG
的任何值小于Thirty
的值
2)任何大于Seventy
问题在于我需要能够在722级别的大型数据框中执行此操作,因为DistrictName
因子。在计算百分位数的步骤中,我使用ave()
函数根据所需因子创建百分位数,如下所示:
MATHgap$Thirty<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName,
FUN= function(x) quantile(x, 0.3))
和
MATHgap$Seventy<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName,
FUN= function(x) quantile(x, 0.7))
有没有办法在ave()
内执行类似于AVERAGEIF的操作,以便对DistrictName
的每个值重复操作,而不是其他值?即,Ionia公立学校的X2.Yr.AVG
平均值应小于-0.196387489,X2.Yr.AVG
大于-0.046524185,我希望能够使用各自的值为所有地区执行相同的功能适用于X2.Yr.AVG
,Thirty
和Seventy
。
如果这令人困惑,请道歉。
答案 0 :(得分:1)
以下是使用dplyr
:
MATHgap %>%
group_by(DistrictName) %>%
mutate(MeanLT30 = mean(X2.Yr.AVG[X2.Yr.AVG < Thirty]),
MeantGT70 = mean(X2.Yr.AVG[X2.Yr.AVG > Seventy]))