计算概率后缀树中上下文关系的升力?

时间:2017-01-27 14:07:50

标签: r markov-chains pst traminer sequence-analysis

PST给出了各种上下文和后续状态的概率和条件概率。但是,能够计算上下文与后续状态之间关系的升力(及其意义)将是非常有帮助的。我怎么能这样做?

# Load libraries
library(RCurl)
library(TraMineR)
library(PST)

# Get data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)

# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)

# Create sequence object
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")

# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = TRUE)

# Look at first state
cmine(S1, pmin = 0, state = "N3", l = 2)

这给出了几个上下文,其中之一是:

[>] context: N2 
       EX         FA         I1         I2   I3         N1        N2         N3        NR         QU
S1 0.07692308 0.08076923 0.05769231 0.07692308 0.05 0.06923077 0.1038462 0.06153846 0.1269231 0.07307692
       TR         *
S1 0.08076923 0.1423077

我们想说我想计算QUN3之间关系的提升。我们知道给定N3 N2的条件概率为0.05。为了计算升力,我是否只需divide the conditional probability by the unconditional probability得到的状态,如下:

0.05/unconditional probability of N3

如果我们seqstatf(data.seq),我们可以看到N3标记的分数为0.01721715。这意味着电梯是:

0.05/0.01721715=2.90408110518

或者更合适的是N3 e给出cmine(S1, pmin = 0, state = "N3", l = 1)的概率,0.0015545690.05/0.001554569=32.163255539 ?这将产生一个提升:

{{1}}

这要高得多......

1 个答案:

答案 0 :(得分:2)

推理是正确的。但是,seqstatf的问题在于它不会将缺失状态(*)考虑在内。以下是如何获得N3

的整体概率
nN3 <- sum(data.seq == 'N3')
nn <- nrow(data.seq)*ncol(data.seq)
(pN3 <- nN3/nn)

给出了0.001556148

所以升降机就在这里

ctx <- cmine(S1, pmin = 0, state = "N3", l = 2)
(liftN3 <- ctx$N2[,"N3"]/pN3)

即,39.5

可以更有意义的另一种选择是在排除缺失状态时考虑条件概率,即用没有缺失状态的树获得的状态。