想象一下,我有以下数据:
Year Month State ppo
2011 Jan CA 220
2011 Feb CA 250
2012 Jan CA 230
2011 Jan WA 200
2011 Feb WA 210
我需要计算一年中每个州的平均值,因此输出看起来像这样:
Year Month State ppo annualAvg
2011 Jan CA 220 230
2011 Feb CA 240 230
2012 Jan CA 260 260
2011 Jan WA 200 205
2011 Feb WA 210 205
其中年平均值是同年该州的任何条目的平均值。如果年份和状态不变,我会知道如何做到这一点,但不知何故,他们是变数的事实让我失望。
环顾四周,似乎ddply可能是我想要用于此(https://stats.stackexchange.com/questions/8225/how-to-summarize-data-by-group-in-r),但当我尝试使用它时,我做错了什么并且一直出错(我试过这么多它的变化,我不打扰在这里发布它们)。知道我实际上应该怎么做吗?
感谢您的帮助!
答案 0 :(得分:1)
试试这个:
library(data.table)
setDT(df)
df[ , annualAvg := mean(ppo) , by =.(Year, State) ]
答案 1 :(得分:0)
使用dplyr
和group_by %>% mutate
添加列:
library(dplyr)
df %>% group_by(Year, State) %>% mutate(annualAvg = mean(ppo))
#Source: local data frame [5 x 5]
#Groups: Year, State [3]
# Year Month State ppo annualAvg
# (int) (fctr) (fctr) (int) (dbl)
#1 2011 Jan CA 220 235
#2 2011 Feb CA 250 235
#3 2012 Jan CA 230 230
#4 2011 Jan WA 200 205
#5 2011 Feb WA 210 205
使用data.table
:
library(data.table)
setDT(df)[, annualAvg := mean(ppo), .(Year, State)]
df
# Year Month State ppo annualAvg
#1: 2011 Jan CA 220 235
#2: 2011 Feb CA 250 235
#3: 2012 Jan CA 230 230
#4: 2011 Jan WA 200 205
#5: 2011 Feb WA 210 205
数据强>:
structure(list(Year = c(2011L, 2011L, 2012L, 2011L, 2011L), Month = structure(c(2L,
1L, 2L, 2L, 1L), .Label = c("Feb", "Jan"), class = "factor"),
State = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("CA",
"WA"), class = "factor"), ppo = c(220L, 250L, 230L, 200L,
210L), annualAvg = c(235, 235, 230, 205, 205)), .Names = c("Year",
"Month", "State", "ppo", "annualAvg"), class = c("data.table",
"data.frame"), row.names = c(NA, -5L), .internal.selfref = <pointer: 0x105000778>)
答案 2 :(得分:0)
基地R:df$ppoAvg <- ave(df$ppo, df$State, df$Year, FUN = mean)