我有一个如下所示的数据集,我需要将每年的值(2005-2009)与(2002-2004)的平均值进行比较。
Year Firm R
2002 A 30
2003 A 11
2004 A 1
2005 A 7
2006 A 15
2007 A 20
2008 A 3.5
2009 A 8
2002 B 24
2003 B 30
2004 B 25
2005 B 5.2
2006 B 11.8
2007 B 78
2008 B 90
2009 B 57
我需要计算每家公司的平均值(2002-2004)并用新值(即计算的平均值)替换2002 - 2004年的值。例如,新数据集应如下所示:
Year Firm R
2002 A 14
2003 A 14
2004 A 14
2005 A 7
2006 A 15
2007 A 20
2008 A 3.5
2009 A 8
2002 B 26.333
2003 B 26.333
2004 B 26.333
2005 B 5.2
2006 B 11.8
2007 B 78
2008 B 90
2009 B 57
我尝试使用以下代码:
df$R[df$Year==2002 & df$Year==2003 & df$Year==2004] = (df$R[df$Year==2002] + df$R[df$Year==2003] + df$R[df$Year==2004])/3
但是当我申请时没有任何改变!!!!! ????? 我希望你能帮助解决这个问题
答案 0 :(得分:1)
您的代码中的错误是您没有按Firm
名称进行分组,而是使用&
代替|
。在我的示例中,test.txt
是输入相同的文件。
下面的代码可以帮助您实现所需。
library(dplyr)
df <- read.delim('test.txt', header = T, sep = '\t')
print(df)
# get unique firm names for grouping
firms <- unique(df$Firm)
# for each firm, calculate mean and update it
for (f in firms){
df$R[df$Firm == f & (df$Year==2002 | df$Year==2003 | df$Year==2004)] =
sum(df$R[df$Firm == f & (df$Year==2002 | df$Year==2003 | df$Year==2004)])/3
}
print(df)
答案 1 :(得分:1)
如果您愿意,可以使用data.table
:
library(data.table)
year <- c(rep(seq(2002,2009,1),2))
firm <- c(rep("A",8),rep("B",8))
r <- c(30,11,1,7,15,20,3.5,8,24,30,25,5.2,11.8,78,90,57)
aa <- data.table(year,firm,r)
aa[year>=2002 & year<=2004, r:= mean(r), by = firm]
给出这个结果:
year firm r
1: 2002 A 14.00000
2: 2003 A 14.00000
3: 2004 A 14.00000
4: 2005 A 7.00000
5: 2006 A 15.00000
6: 2007 A 20.00000
7: 2008 A 3.50000
8: 2009 A 8.00000
9: 2002 B 26.33333
10: 2003 B 26.33333
11: 2004 B 26.33333
12: 2005 B 5.20000
13: 2006 B 11.80000
14: 2007 B 78.00000
15: 2008 B 90.00000
16: 2009 B 57.00000
答案 2 :(得分:0)
试试这个dplyr版本:
library(tidyverse)
data %>%
filter(Year<2005) %>% # this subsets the data
group_by(Firm) %>% # state which values you want to evaluate
summarise(m=mean(R)) %>% # take the mean (named mean)
left_join(data) %>% # join the original data to the summarised data
mutate(R=ifelse(Year<2005 & Firm=='A', m,
ifelse(Year<2005 & Firm=='B', m, R))) %>% # nested ifelse to define conditions
select(year,firm,R) -> newdata # select the desired columns and rename the data.frame