使VLMC适合非常长的序列

时间:2017-01-31 22:20:16

标签: r markov-chains traminer sequence-analysis

我试图将VLMC拟合到数据集中,其中最长的序列是296个状态。我这样做如下所示:

# Load libraries
library(PST)
library(RCurl)
library(TraMineR)

# Load and transform data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/241ef39125ecb55a85b43d7f4cd3d58f617b2ecf/challenge_level.csv")
data <- read.csv(text = x)

data.seq <- seqdef(data[,2:ncol(data)], missing = NA, right = NA, nr = "*")
S1 <- pstree(data.seq, ymin = 0.01, lik = TRUE, with.missing = TRUE, nmin = 2)

然而,这会产生以下错误:

Error in res[i, , drop = FALSE] : subscript out of bounds

如何使用这么长的序列将模型拟合到数据?限制模型中的长度是否有任何合理的理由?

1 个答案:

答案 0 :(得分:3)

问题来自您的数据。如果不在pstree函数中设置L,则表示您希望拟合最大顺序的模型。拟合过程在L = 8时产生错误,因为你有nmin = 2但是在这个顺序中只有一个上下文有nmin = 2

> cprob(data.seq, L=8, nmin=2)
 [>] 21 sequences, min/max length: 19/296
 [>] computing prob., L=8, 2043 distinct context(s)
 [>] removing 1894 context(s) where n<2
 [>] total time: 0.156 secs
                        EX  FA I1  I2 I3 N1 N2 N3 NR QU TR [n]
I2-I3-FA-I3-EX-I3-EX-I2  0 0.5  0 0.5  0  0  0  0  0  0  0   2

使用L = 8拟合模型可以正常工作

S1 <- pstree(data.seq, ymin = 0.01, lik = TRUE, nmin = 2, L=8) 

 [>] 21 sequence(s) - min/max length: 19/296
 [>] max. depth L=8, nmin=2, ymin=0.01
     [L]  [nodes]
       0        1
       1       11
       2       99
       3      368
       4      340
       5      126
       6       34
       7        4
       8        1
 [>] computing sequence(s) likelihood ... (0.804 secs)
 [>] total time: 2.968 secs

同样,您不需要使用任何&#39;&#39;对#39;或者&#39; nr&#39; seqdef()中的选项,也没有&#39; with.missing&#39;在pstree()

最佳, 亚历克西斯