可能重复:
faster way to create variable that aggregates a column by id
我在项目上遇到了麻烦。我创建了一个长格式的数据框(称为dat)(我复制了下面的前三行),我想计算一下2000年到2011年美国所有银行税前收入的平均值。我怎么办?去做?我几乎没有R的经验。如果答案太明显,我很抱歉,但我找不到任何东西,而且我已经花了很多时间在这个项目上。先感谢您!
KeyItem Bank Country Year Value
1 Pretax Income WELLS_FARGO_&_COMPANY UNITED STATES 2011 2.365600e+10
2 Total Assets WELLS_FARGO_&_COMPANY UNITED STATES 2011 1.313867e+12
3 Total Liabilities WELLS_FARGO_&_COMPANY UNITED STATES 2011 1.172180e+12
答案 0 :(得分:1)
以下内容应该让您入门。你基本上需要做两件事:子集和聚合。我将演示基础R解决方案和data.table
解决方案。
首先,一些样本数据。
set.seed(1) # So you can reproduce my results
dat <- data.frame(KeyItem = rep(c("Pretax", "TotalAssets", "TotalLiabilities"),
times = 30),
Bank = rep(c("WellsFargo", "BankOfAmerica", "ICICI"),
each = 30),
Country = rep(c("UnitedStates", "India"), times = c(60, 30)),
Year = rep(c(2000:2009), each = 3, times = 3),
Value = runif(90, min=300, max=600))
让我们将“Pretax”值的平均值按“国家”和“年份”汇总,但仅限于2001年至2005年。
aggregate(Value ~ Country + Year,
dat[dat$KeyItem == "Pretax" & dat$Year >= 2001 & dat$Year <=2005, ],
mean)
# Country Year Value
# 1 India 2001 399.7184
# 2 UnitedStates 2001 464.1638
# 3 India 2002 443.5636
# 4 UnitedStates 2002 560.8373
# 5 India 2003 562.5964
# 6 UnitedStates 2003 370.9591
# 7 India 2004 404.0050
# 8 UnitedStates 2004 520.4933
# 9 India 2005 567.6595
# 10 UnitedStates 2005 493.0583
data.table
library(data.table)
DT <- data.table(dat, key = "Country,Bank,Year")
subset(DT, KeyItem == "Pretax")[Year %between% c(2001, 2005),
mean(Value), by = list(Country, Year)]
# Country Year V1
# 1: India 2001 399.7184
# 2: India 2002 443.5636
# 3: India 2003 562.5964
# 4: India 2004 404.0050
# 5: India 2005 567.6595
# 6: UnitedStates 2001 464.1638
# 7: UnitedStates 2002 560.8373
# 8: UnitedStates 2003 370.9591
# 9: UnitedStates 2004 520.4933
# 10: UnitedStates 2005 493.0583