我有一个df,包含许多观察结果和124列,其中最后119个是从2005年2月开始到2014年11月结束的连续月份。每一行包含一个识别码,一个开始月份(常年形式),然后是三列模型结果:值,模型预测值和残差。如上所述,其余的是几个月。我尝试提供下面的前100行 - 道歉但是119日期列的输出太长了。
对于每个代码/起始月份,在开始月份之后的十二个月中都有值。因此,例如,我的第一行的开始月份为2012年1月,因此除了2012年2月至2013年1月的列之外,月份列都是NA。下一行是相同的代码,开始月份是2012年2月(不同的模型结果),因此2012年3月 - 2013年2月的列有值,其他所有都是NA。
我无法弄清楚如何将非NA列聚合为只有每行填充的12个值。我们的目标是最终在数月内运行累积产品函数,其值低于df,但我只需要每列重要的12列。
是否存在以某种方式标准化而不管实际月份(列)的良好函数,并且将提供相同的五个起始列,然后提供总是不同的所需值的12列?
还有一个复杂的因素,并非所有这些行都连续12个月填满。例如,在12个月系列的第4个月可能没有任何价值,但是仍然只有12个月可以在该行中获得价值。
非常感谢。
编辑:数据帧的一个示例:
structure(list(code = c("qqq", "qqq", "qqq", "qqq", "qqq", "qqq",
"qqq", "qqq", "qqq", "qqq"), startmonth = structure(c(2012, 2012.08333333333,
2012.16666666667, 2012.25, 2012.33333333333, 2012.41666666667,
2012.5, 2012.58333333333, 2012.66666666667, 2012.75), class = "yearmon"),
actual = c(531.8070679, 481.1286926, 466.6588745, 503.1075134,
569.5734863, 569.586792, 586.2329102, 561.9111328, 499.8103027,
498.1677856), model = c(686.9315941, 642.5716051, 616.4195817,
639.1987258, 684.4228377, 671.7456088, 669.476986, 649.4140804,
606.2803121, 557.0674166), residual = c(-155.1245262, -161.4429125,
-149.7607072, -136.0912124, -114.8493514, -102.1588168, -83.24407583,
-87.50294761, -106.4700094, -58.89963098), `Dec 2012` = c("3.314476013",
"3.314476013", "3.314476013", "3.314476013", "3.314476013",
"3.314476013", "3.314476013", "3.314476013", "3.314476013",
"3.314476013"), `Jan 2013` = c("2.016448021", "2.016448021",
"2.016448021", "2.016448021", "2.016448021", "2.016448021",
"2.016448021", "2.016448021", "2.016448021", "2.016448021"
), `Feb 2013` = c(NA, "0.041545", "0.041545", "0.041545",
"0.041545", "0.041545", "0.041545", "0.041545", "0.041545",
"0.041545"), `Mar 2013` = c(NA, NA, "1.944175005", "1.944175005",
"1.944175005", "1.944175005", "1.944175005", "1.944175005",
"1.944175005", "1.944175005"), `Apr 2013` = c(NA, NA, NA,
"0.898332", "0.898332", "0.898332", "0.898332", "0.898332",
"0.898332", "0.898332"), `May 2013` = c(NA, NA, NA, NA, "1.043239951",
"1.043239951", "1.043239951", "1.043239951", "1.043239951",
"1.043239951"), `Jun 2013` = c(NA, NA, NA, NA, NA, "0.722914994",
"0.722914994", "0.722914994", "0.722914994", "0.722914994"
), `Jul 2013` = c(NA, NA, NA, NA, NA, NA, "-0.349180996",
"-0.349180996", "-0.349180996", "-0.349180996"), `Aug 2013` = c(NA,
NA, NA, NA, NA, NA, NA, "0.074822001", "0.074822001", "0.074822001"
), `Sep 2013` = c(NA, NA, NA, NA, NA, NA, NA, NA, "-1.258324027",
"-1.258324027"), `Oct 2013` = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, "1.153113008")), .Names = c("code", "startmonth",
"actual", "model", "residual", "Dec 2012", "Jan 2013", "Feb 2013",
"Mar 2013", "Apr 2013", "May 2013", "Jun 2013", "Jul 2013", "Aug 2013",
"Sep 2013", "Oct 2013"), row.names = c(NA, 10L), class = "data.frame")