我有一个季度不平衡的面板数据如下所示:
Firm Date Var_1
AAA 19701130 24.46
AAA 19701231 NA
AAA 19710131 NA
AAA 19710228 34.19325
AAA 19710331 NA
AAA 19710430 NA
AAA 19710531 29.0235
AAA 19710630 NA
AAA 19710731 NA
AAA 19710831 16.256875
AAA 19710930 NA
AAA 19711031 NA
AAA 19711130 17.22125
AAA 19711231 NA
BBB 19730630 4.57
BBB 19730731 NA
BBB 19730831 NA
BBB 19730930 8.736
BBB 19731031 NA
BBB 19731130 NA
BBB 19731231 4.97
BBB 19740131 NA
BBB 19740228 NA
BBB 19740331 6.85125
BBB 19740430 NA
BBB 19740531 NA
BBB 19740630 6.87225
BBB 19740731 NA
BBB 19740831 NA
BBB 19740930 5.454875
BBB 19741031 NA
BBB 19741130 NA
BBB 19741231 4.56875
BBB 19750131 NA
BBB 19750228 NA
BBB 19750331 6.276
BBB 19750430 NA
BBB 19750531 NA
BBB 19750630 6.0145
BBB 19750731 NA
BBB 19750831 NA
BBB 19750930 8.376
BBB 19751031 NA
BBB 19751130 NA
BBB 19751231 9.17875
真实数据持续数万行。这里指出的是每个公司在不同的月底报告。如何计算每家公司每年Var_1
的平均值?最终结果应该是一年而不是一个季度。理想的结果将如下所示
Firm Date Var_1
AAA 1970 24.46
AAA 1971 24.17
BBB 1973 6.09
BBB 1974 5.94
BBB 1975 7.46
答案 0 :(得分:0)
我们可以按功能使用其中一个。在按“公司”和“日期”的子字符串进行分组后,获取“Var_1”的mean
。
library(dplyr)
df1 %>%
group_by(Firm, Date = substr(Date, 1,4 )) %>%
summarise(Var_1 = round(mean(Var_1, na.rm = TRUE), 2))
# Firm Date Var_1
# <chr> <chr> <dbl>
#1 AAA 1970 24.46
#2 AAA 1971 24.17
#3 BBB 1973 6.09
#4 BBB 1974 5.94
#5 BBB 1975 7.46
或aggregate
base R
aggregate(Var_1~., transform(df1, Date = substr(Date, 1, 4)), FUN = mean, na.rm = TRUE)