我试图找到特定于我指定不同变量的级别的变量的平均值(平均值)。
到目前为止,我创建了一个新变量,其中包含与之关联的各种级别:
pincome$income_growth <- ifelse(pincome$incomechng <= 0, "level 1",
ifelse(pincome$incomechng < 1,"level 2","level 3"))
现在我想确定与上述水平相关的另一个变量的平均值(例如1级的平均收入(收入增长小于0%)。
我希望这是有道理的,我对R来说非常新手并试图抓住它!
答案 0 :(得分:0)
如果你想要基础R,请尝试by
(?by
)。如果你开始做更复杂的事情,那么plyr
/ dplyr
包非常惊人,如果你我会用大量的数据集来解决这个问题,并且不用考虑更多的初始学习曲线,data.table
包也很棒。
E.g。
set.seed(1) # so your random numbers are the same as mine
pincome <- data.frame(incomechng = runif(20, min=-1, max=3))
# what you had was fine too; using ?cut is another way to do it
# have just put it in for demonstration purposes.
# though `cut` uses intervals like (a, b] or [a, b) whereas yours
# are (-Inf, 0] (0, 1) [1, Inf) which is a little different.
pincome$income_growth <- cut(pincome$incomechng,
breaks=c(-Inf, 0, 1, Inf),
labels=paste("level", 1:3))
现在我们可以取每组内的平均值。我已经展示了三种选择;我确定还有更多。
# base R ?by
by(pincome$incomechng, pincome$income_growth, mean)
# pincome$income_growth: level 1
# [1] -0.6848674
# ------------------------------------------
# pincome$income_growth: level 2
# [1] 0.4132334
# ------------------------------------------
# pincome$income_growth: level 3
# [1] 1.772039
# plyr (dplyr has pipe syntax you may prefer but is otherwise the same)
library(plyr)
ddply(pincome, .(income_growth), summarize, avgIncomeGrowth=mean(incomechng))
# income_growth avgIncomeGrowth
# 1 level 1 -0.6848674
# 2 level 2 0.4132334
# 3 level 3 1.7720395
# data.table
library(data.table)
setDT(pincome)
pincome[, list(avgIncomeGrowth=mean(incomechng)), by=income_growth]
# income_growth avgIncomeGrowth
# 1: level 2 0.4132334
# 2: level 3 1.7720395
# 3: level 1 -0.6848674
答案 1 :(得分:0)
如果您想要一个整洁的解决方案:
library(tidyverse)
pincome %>%
mutate(income_growth = case_when(incomechng <= 0 ~ "level 1",
incomechng < 1 ~ "level 2",
TRUE ~ "level 3")) %>%
group_by(income_growth) %>%
summarize(avgIncomeGrowth = mean(incomechng,na.rm=TRUE))