如何整理S4 depmix对象列表?

时间:2017-05-30 09:34:45

标签: r tidyverse

不确定标题是否有意义。随意重新说出来。

此末的数据。

无论如何,我已经将HMM安装到许多不同的序列中,如此

Random_Tracks_HMM <- Random_Tracks %>% 
group_by(track_id) %>%
do(hmm.storage = fit(depmix(data = ., steplength ~ 1, family = gaussian(), nstates = 3),
                     verbose = FALSE, method = "rsolnp"))

从此我得到一个具有以下结构的列表

   track_id         hmm.storage
 *   <fctr>              <list>
 1   10487B <S4: depmix.fitted>
 2   11016E <S4: depmix.fitted>
 3   13161C <S4: depmix.fitted>
 4   13859A <S4: depmix.fitted>

我可以轻松地从列表中访问单个元素,如此

> BIC(Random_Tracks_HMM$hmm.storage[[1]])
[1] 41.43906
> posterior(Random_Tracks_HMM$hmm.storage[[1]])
   state        S1        S2        S3
1      3 0.3332823 0.3333089 0.3334088
2      1 0.3333353 0.3333353 0.3333293
3      1 0.3333373 0.3333326 0.3333301

但对于每个track_id,我希望所有元素,例如在这样的长格式数据帧中,它将能够容纳可变大小的输出(因为序列的长度不同)。

track_id state S1 S2 S3
1          .   .  .  .
1
1
2
2
3
4
4
4
4

我尝试过循环,但它会抛出一个错误并创建非常混乱的输出。 tidy似乎也无法运作。

这里有一些最小数据要加载为Random_Tracks

   track_id steplength
     <fctr>      <dbl>
 1   49593A 0.00000000
 2   49593A 0.47918441
 3   49593A 0.46654421
 4   49593A 0.48018923
 5   49593A 0.71400710
 6   49593A 0.35758252
 7   49593A 0.60385075
 8   49593A 0.78503816
 9   49593A 0.54192845
10   49593A 0.58040040
11   49593A 0.65381647
12   49593A 0.58918460
13   49593A 0.57775124
14   49593A 0.90311395
15   49593A 0.08008648
16   49593A 0.25568897
17   49593A 0.21103367
18   49593A 0.76625123
19   49593A 0.74180922
20   49593A 0.93648613
21   49593A 0.48482763
22   49593A 0.69910820
23   49593A 0.39311410
24   49593A 0.29748085
25   49593A 0.27830654
26   49593A 0.31932964
27   49593A 0.69976601
28   49593A 0.25546627
29   49593A 0.95409541
30   29801E 0.00000000
31   29801E 0.05489061
32   29801E 0.34348345
33   29801E 0.34834684
34   29801E 0.34808459
35   29801E 0.31985100
36   29801E 0.48691151
37   29801E 0.19251683
38   29801E 0.61683268
39   29801E 0.36238232
40   29801E 0.30703690
41   29801E 0.21100359
42   29801E 0.05879426
43   29801E 0.10818249
44   29801E 0.23526174
45   29801E 0.13857242
46   29801E 0.27194222
47   29801E 0.59982642
48   29801E 0.36022162
49   29801E 0.22279773
50   29801E 0.20496276
51   29801E 0.33738574
52   29801E 0.09493141
53   29801E 0.20564929
54   29801E 0.25444537
55   29801E 0.43179286
56   29801E 0.07274149
57   29801E 0.84223099
58   29801E 0.72873327
59   29801E 0.64422859

1 个答案:

答案 0 :(得分:0)

Figured I'd post the answer that I ended up using.

Assuming that the Hidden Markov Model fitted by depmixS4 is named HMM it's rather easy to accomplish with a loop.

# Initialize an empty list
datalist = list()

# Calculate posterior for every track_id i that was fitted, and append extra information
# R will simply duplicate grouping variables to match the length of resulting HMM (which is what we want in this case)
for (i in 1:length(HMM$hmm.model)){
    df <- cbind(posterior(    HMM$hmm.model[[i]]),
                as.data.frame(HMM$track_id[[i]]),
                as.data.frame(HMM$lipase[[i]]),
                as.data.frame(HMM$condition[[i]]))

    datalist[[i]] <- df
}

# Bind all lists (each containing a small df) together
HMM_state_models <- data.table::rbindlist(datalist)