R的计算和更换

时间:2017-02-27 01:21:26

标签: r

我有一个如下所示的数据集,我需要将每年的值(2005-2009)与(2002-2004)的平均值进行比较。

Year   Firm    R    
2002   A       30    
2003   A       11    
2004   A       1     
2005   A       7     
2006   A       15    
2007   A       20    
2008   A       3.5   
2009   A       8     
2002   B       24    
2003   B       30    
2004   B       25    
2005   B       5.2   
2006   B       11.8  
2007   B       78    
2008   B       90    
2009   B       57  

我需要计算每家公司的平均值(2002-2004)并用新值(即计算的平均值)替换2002 - 2004年的值。例如,新数据集应如下所示:

 Year   Firm    R    
    2002   A       14    
    2003   A       14    
    2004   A       14     
    2005   A       7     
    2006   A       15    
    2007   A       20    
    2008   A       3.5   
    2009   A       8     
    2002   B       26.333    
    2003   B       26.333    
    2004   B       26.333    
    2005   B       5.2   
    2006   B       11.8  
    2007   B       78    
    2008   B       90    
    2009   B       57

我尝试使用以下代码:

df$R[df$Year==2002 & df$Year==2003 & df$Year==2004] = (df$R[df$Year==2002] + df$R[df$Year==2003] + df$R[df$Year==2004])/3

但是当我申请时没有任何改变!!!!! ????? 我希望你能帮助解决这个问题

3 个答案:

答案 0 :(得分:1)

您的代码中的错误是您没有按Firm名称进行分组,而是使用&代替|。在我的示例中,test.txt是输入相同的文件。

下面的代码可以帮助您实现所需。

library(dplyr)
df <- read.delim('test.txt', header = T, sep = '\t')

print(df)

# get unique firm names for grouping
firms <- unique(df$Firm)

# for each firm, calculate mean and update it    
for (f in firms){
    df$R[df$Firm == f & (df$Year==2002 | df$Year==2003 | df$Year==2004)] = 
      sum(df$R[df$Firm == f & (df$Year==2002 | df$Year==2003 | df$Year==2004)])/3
}

print(df)

答案 1 :(得分:1)

如果您愿意,可以使用data.table

library(data.table)

year <- c(rep(seq(2002,2009,1),2))
firm <- c(rep("A",8),rep("B",8))
r <- c(30,11,1,7,15,20,3.5,8,24,30,25,5.2,11.8,78,90,57)

aa <- data.table(year,firm,r)

aa[year>=2002 & year<=2004, r:= mean(r), by = firm]

给出这个结果:

    year firm        r
 1: 2002    A 14.00000
 2: 2003    A 14.00000
 3: 2004    A 14.00000
 4: 2005    A  7.00000
 5: 2006    A 15.00000
 6: 2007    A 20.00000
 7: 2008    A  3.50000
 8: 2009    A  8.00000
 9: 2002    B 26.33333
10: 2003    B 26.33333
11: 2004    B 26.33333
12: 2005    B  5.20000
13: 2006    B 11.80000
14: 2007    B 78.00000
15: 2008    B 90.00000
16: 2009    B 57.00000

答案 2 :(得分:0)

试试这个dplyr版本:

library(tidyverse)

data %>%
    filter(Year<2005) %>% # this subsets the data
    group_by(Firm) %>% # state which values you want to evaluate
    summarise(m=mean(R)) %>% # take the mean (named mean)
    left_join(data) %>% # join the original data to the summarised data
    mutate(R=ifelse(Year<2005 & Firm=='A', m,
                ifelse(Year<2005 & Firm=='B', m, R))) %>% # nested ifelse to define conditions
    select(year,firm,R) -> newdata # select the desired columns and rename the data.frame