Question

在R.工作我想使用初始值和一组转换参数预测流行率的时间序列。对于以下结构的数据

 cohort <- c(1980,1981,1982)
 A00 <- c(.15, .2,.4)
 B00 <- c(.25, .3, .4) 
 C00 <-c(.6, .5,.2)
 Tab<-c(.6,.5,.4)
 Tac<-c(.2,.25,.35)
 ds <- data.frame(cohort,A00,B00,C00,Tab,Tac)
 print (ds)

  cohort  A00  B00 C00 Tab  Tac
1   1980 0.15 0.25 0.6 0.6 0.20
2   1981 0.20 0.30 0.5 0.5 0.25
3   1982 0.40 0.40 0.2 0.4 0.35

列A00，B00和C00中的初始值表示在时间t = 00时每个组（A，B，C）的相关大小。它们在行中总计为1（A00 + B00 + C00 = 1）。参数Tab和Tac用于使用某些数学模型预测时间t + 1的患病率，例如

A01   = df$A00 -df$Tab +df$Tac.

在时间t + 1计算预测值的函数是

 forecast<- function( df ) {
  dsResult <- data.frame(
    cohort= df$cohort,
    A01   = df$A00 -df$Tab +df$Tac ,    
    B01   = df$B00 -df$Tab +df$Tac,    
    C01  =  df$C00 -df$Tab +df$Tac    

  )
  dsResult<- merge(df,dsResult,by="cohort")
  return( dsResult)
}
new<-forecast(ds)

并产生以下结果

  cohort  A00  B00 C00 Tab  Tac   A01   B01  C01
1   1980 0.15 0.25 0.6 0.6 0.20 -0.25 -0.15 0.20
2   1981 0.20 0.30 0.5 0.5 0.25 -0.05  0.05 0.25
3   1982 0.40 0.40 0.2 0.4 0.35  0.35  0.35 0.15

我非常感谢你帮助我学习如何编写一个循环来循环预测所需的年数（例如，1：7中的t）。提前谢谢！

Answer 1

最初，我想提出两条可能使问题更容易编码的建议。首先，修改数据模式，以便每年都是唯一的行，每个组都是唯一的列。其次，由于队列在数学上是相互独立的，所以至少在构建代码内核之前，将它们分开。稍后绕过它循环，循环遍历它们。在第一个代码块中，有两个矩阵，一个包含观察到的数据，另一个用于收集预测数据。

yearCount <- 7 #Declare the number of time points.
groupCount <- 3 #Declare the number of groups.

#Create fake data that sum to 1 across rows/times.
ob <- matrix(runif(yearCount*groupCount), ncol=groupCount)
ob <- ob / apply(ob, 1, function( x ){ return( sum(x) )})

#Establish a container to old the predicted values.
pred <- matrix(NA_real_, ncol=groupCount, nrow=yearCount)

t12<-.5; t13<-.2; t11<-1-t12-t13 #Transition parameters from group 1
t21<-.2; t23<-.4; t22<-1-t21-t23 #Transition parameters from group 2
t31<-.3; t32<-.1; t33<-1-t31-t32 #Transition parameters from group 3

for( i in 2:yearCount ) {
  pred[i, 1] <- ob[i-1, 1]*t11 + ob[i-1, 2]*t21 + ob[i-1, 3]*t31
  pred[i, 2] <- ob[i-1, 1]*t12 + ob[i-1, 2]*t22 + ob[i-1, 3]*t32
  pred[i, 3] <- ob[i-1, 1]*t13 + ob[i-1, 2]*t23 + ob[i-1, 3]*t33
}

#Calculate the squared errors
ss <- (pred[-1, ] - ob[-1, ])^2 #Ignore the first year of data

在循环内部，您可能会注意到熟悉的矩阵乘法结构。每行可以使用内积轻微压缩（即，ob矩阵的一行乘以，然后与t的一个＆＃34;列相加。我＆＃39 ; m使用t12与帖子中的Tab略有不同;这是在给定时间点从第1组转换到第2组的概率。

#Create transition parameters that sum to 1 across rows/groups.
tt <-  matrix(runif(groupCount*groupCount), ncol=groupCount)
tt <- tt / apply(tt, 1, function( x ){ return( sum(x) )})

假设先前已定义tt矩阵，而不是t11，...，t33的单独变量。

for( i in 2:yearCount ) {
  pred[i, 1] <- ob[i-1, ] %*% tt[, 1] 
  pred[i, 2] <- ob[i-1, ] %*% tt[, 2]
  pred[i, 3] <- ob[i-1, ] %*% tt[, 3]
}

循环的内容比每个元素对明确地相乘和求和时稍微清晰一些。但我们不必单独处理每一行/列对。 ob矩阵的所有三列可以同时由tt矩阵的所有三列操作：

for( i in 2:yearCount ) {
  pred[i, ] <- ob[i-1, ] %*% tt
}

这应该比以前的版本快得多，因为R的内部存储器系统不会为每行重新创建矩阵三次 - 每行只有一次。要将每个矩阵减少一次，请使用apply函数，然后根据您的目的转置矩阵。最后，请注意，行代表不同于pred的年份（即，此处的行i-1与pred中的行i相同）。

predictionWIthExtraYear <- t(apply(ob, 1, FUN=function(row){row %*% tt}))

为了适应同类群组，也许您可以声明一个包含三个元素的列表（对于1980,1981和1982年的同类群组）。每个元素都是唯一的ob矩阵。并为唯一的pred矩阵创建第二个列表。或者也许使用三维矩阵（但是当R用替换函数重新创建内存时，这可能会更加沉重。）

在时间序列中预测的功能

1 个答案: