如何计算R中不平衡面板数据的年度均值?

时间:2016-11-29 03:25:25

标签: r panel data-cleaning

我有一个季度不平衡的面板数据如下所示:

Firm    Date    Var_1               
AAA 19701130    24.46               
AAA 19701231    NA              
AAA 19710131    NA              
AAA 19710228    34.19325                
AAA 19710331    NA              
AAA 19710430    NA              
AAA 19710531    29.0235             
AAA 19710630    NA              
AAA 19710731    NA              
AAA 19710831    16.256875               
AAA 19710930    NA              
AAA 19711031    NA              
AAA 19711130    17.22125                
AAA 19711231    NA              
BBB 19730630    4.57                
BBB 19730731    NA              
BBB 19730831    NA              
BBB 19730930    8.736               
BBB 19731031    NA              
BBB 19731130    NA              
BBB 19731231    4.97                
BBB 19740131    NA              
BBB 19740228    NA              
BBB 19740331    6.85125             
BBB 19740430    NA              
BBB 19740531    NA              
BBB 19740630    6.87225             
BBB 19740731    NA              
BBB 19740831    NA              
BBB 19740930    5.454875                
BBB 19741031    NA              
BBB 19741130    NA              
BBB 19741231    4.56875             
BBB 19750131    NA              
BBB 19750228    NA              
BBB 19750331    6.276               
BBB 19750430    NA              
BBB 19750531    NA              
BBB 19750630    6.0145              
BBB 19750731    NA              
BBB 19750831    NA              
BBB 19750930    8.376               
BBB 19751031    NA              
BBB 19751130    NA              
BBB 19751231    9.17875             

真实数据持续数万行。这里指出的是每个公司在不同的月底报告。如何计算每家公司每年Var_1的平均值?最终结果应该是一年而不是一个季度。理想的结果将如下所示

Firm    Date    Var_1   
AAA     1970    24.46   
AAA     1971    24.17   
BBB     1973    6.09    
BBB     1974    5.94    
BBB     1975    7.46    

1 个答案:

答案 0 :(得分:0)

我们可以按功能使用其中一个。在按“公司”和“日期”的子字符串进行分组后,获取“Var_1”的mean

library(dplyr)
df1 %>% 
    group_by(Firm, Date = substr(Date, 1,4 )) %>% 
    summarise(Var_1 = round(mean(Var_1, na.rm = TRUE), 2))
#   Firm  Date Var_1
#  <chr> <chr> <dbl>
#1   AAA  1970 24.46
#2   AAA  1971 24.17
#3   BBB  1973  6.09
#4   BBB  1974  5.94
#5   BBB  1975  7.46

aggregate

中的base R
aggregate(Var_1~., transform(df1, Date = substr(Date, 1, 4)), FUN = mean, na.rm = TRUE)