我敢肯定,有一种方法可以通过在循环中编写一个函数来实现,但是movavg()函数正是我想要做的,我只是想不通如何将它应用于跨组的应用程序中。数据框。
数据集很大,但我想占用几列(例如
data <- c("Species", "Tonnes", "Year")
并将每年每个物种的所有行(吨)分组(每年每个物种会有数百行),然后计算连续10年的加权移动平均值。因此,除了摘要(平均值=均值(吨))之外,我还有
data %>%
group_by(Species, Year) %>%
summarise(wma = movavg(x = Tonnes, n = 9, type = "w"))
对于movavg()函数,x =作为数字矢量的时间序列,n =向后窗口长度(因此,我认为自从我有10年的数据以来,这是9年),对于加权移动平均值,类型为“ w”。但是我想不出一种有效的方法来让x和n引用分组的变量。
任何帮助将不胜感激!
数据样本:
dput(sample)
structure(list(Species = c("Argyrosomus hololepidotus", "Argyrosomus hololepidotus",
"Argyrosomus hololepidotus", "Argyrosomus hololepidotus", "Argyrosomus hololepidotus",
"Argyrosomus hololepidotus", "Argyrosomus hololepidotus", "Argyrosomus hololepidotus",
"Argyrosomus hololepidotus", "Argyrosomus hololepidotus", "Argyrosomus hololepidotus",
"Dipturus batis", "Dipturus batis", "Dipturus batis", "Dipturus batis",
"Dipturus batis", "Dipturus batis", "Dipturus batis", "Dipturus batis",
"Dipturus batis", "Dipturus batis", "Dipturus batis", "Dipturus batis",
"Dipturus batis", "Dipturus batis", "Dipturus batis", "Dipturus batis",
"Epinephelus striatus", "Epinephelus striatus", "Epinephelus striatus",
"Epinephelus striatus", "Epinephelus striatus", "Epinephelus striatus",
"Epinephelus striatus", "Epinephelus striatus", "Epinephelus striatus",
"Epinephelus striatus", "Epinephelus striatus", "Epinephelus striatus",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica"
), Year = c(2011, 2012, 2012, 2013, 2013, 2013, 2014, 2014, 2014,
2015, 2015, 2011, 2011, 2011, 2011, 2012, 2012, 2012, 2013, 2013,
2013, 2014, 2014, 2014, 2015, 2015, 2015, 2011, 2011, 2012, 2012,
2013, 2013, 2013, 2014, 2014, 2015, 2015, 2015, 2011, 2011, 2011,
2011, 2012, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013,
2014, 2014, 2014, 2014, 2014, 2015, 2015, 2015, 2015, 2015, 2011,
2011, 2011, 2011, 2011, 2012, 2012, 2012, 2012, 2012, 2013, 2013,
2013, 2013, 2013, 2014, 2014, 2014, 2014, 2014, 2015, 2015, 2015,
2015), Country = c("Australia", "Australia", "Congo Rep", "Australia",
"Congo Rep", "Cuba", "Australia", "Congo Rep", "Cuba", "Australia",
"Congo Rep", "France", "Iceland", "Norway", "UK", "Iceland",
"Norway", "UK", "Iceland", "Norway", "UK", "Iceland", "Norway",
"UK", "France", "Norway", "UK", "Bahamas", "Cuba", "Bahamas",
"Cuba", "Bahamas", "Colombia", "Cuba", "Bahamas", "Cuba", "Bahamas",
"Colombia", "Cuba", "Belgium", "France", "Portugal", "Spain",
"Belgium", "France", "Portugal", "Spain", "UK", "Belgium", "France",
"Portugal", "Spain", "UK", "Belgium", "France", "Portugal", "Spain",
"UK", "Belgium", "France", "Portugal", "Spain", "UK", "France",
"Iceland", "Ireland", "Spain", "UK", "France", "Iceland", "Ireland",
"Spain", "UK", "France", "Iceland", "Ireland", "Spain", "UK",
"France", "Iceland", "Ireland", "Spain", "UK", "France", "Ireland",
"Spain", "UK"), Tonnes = c(106.05352, 156.34223, 126.58993, 186.88017,
99.93942, 2.11171, 141.38148, 76.62019, 1.05582, 139.68761, 68.84715,
1.01059, 122.28238, 51.54047, 1.01059, 146.53669, 21.22253, 4.04238,
154.62159, 16.16959, 1.01061, 221.32109, 13.4511, 1.01058, 0.10106,
11.40965, 1.48557, 123.76517, 31.19285, 75.46656, 26.16171, 92.57231,
0.20119, 13.08084, 148.92064, 25.15554, 53.32971, 0.20118, 25.15551,
121.27182, 45.47698, 41.43453, 38.40273, 127.33538, 55.58288,
31.32855, 21.22261, 4.04233, 102.07043, 59.62533, 30.31795, 11.11656,
27.28616, 85.90087, 82.86908, 29.30735, 5.05297, 38.40275, 110.25632,
60.06995, 21.18214, 3.52698, 44.86048, 184.93939, 17.18016, 8.08486,
60.63577, 90.9538, 171.80171, 21.22261, 7.07419, 56.59356, 62.65707,
158.66387, 37.39215, 3.03183, 32.33914, 85.90084, 169.78059,
14.1483, 2.02121, 18.19081, 55.58289, 195.41931, 2.99138, 12.89521,
26.33628)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -87L), vars = c("Species", "Year"), drop = TRUE, indices = list(
0L, 1:2, 3:5, 6:8, 9:10, 11:14, 15:17, 18:20, 21:23, 24:26,
27:28, 29:30, 31:33, 34:35, 36:38, 39:42, 43:47, 48:52, 53:57,
58:62, 63:67, 68:72, 73:77, 78:82, 83:86), group_sizes = c(1L,
2L, 3L, 3L, 2L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 4L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L), biggest_group_size = 5L, labels = structure(list(
Species = c("Argyrosomus hololepidotus", "Argyrosomus hololepidotus",
"Argyrosomus hololepidotus", "Argyrosomus hololepidotus",
"Argyrosomus hololepidotus", "Dipturus batis", "Dipturus batis",
"Dipturus batis", "Dipturus batis", "Dipturus batis", "Epinephelus striatus",
"Epinephelus striatus", "Epinephelus striatus", "Epinephelus striatus",
"Epinephelus striatus", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja circularis", "Leucoraja circularis", "Leucoraja circularis",
"Leucoraja fullonica", "Leucoraja fullonica", "Leucoraja fullonica",
"Leucoraja fullonica", "Leucoraja fullonica"), Year = c(2011,
2012, 2013, 2014, 2015, 2011, 2012, 2013, 2014, 2015, 2011,
2012, 2013, 2014, 2015, 2011, 2012, 2013, 2014, 2015, 2011,
2012, 2013, 2014, 2015)), class = "data.frame", row.names = c(NA,
-25L), vars = c("Species", "Year"), drop = TRUE))
答案 0 :(得分:1)
这是一个可能的解决方案。主要区别在于,这归结为您要首先运行移动平均线的水平,然后创建移动平均线。
data %>%
group_by(Species, Year) %>%
summarise(Tonnes = sum(Tonnes, na.rm = TRUE)) %>%
arrange(Year) %>% #Just in case the years were out of order because the movavg assumes chronological order
mutate(Tonnes_ma = movavg(Tonnes, 3, "w")) #Calculating weighted moving averages
我在这里使用3作为n
是因为我只有5年的数据。