使用catnet包的贝叶斯网络:处理丢失的数据

时间:2016-06-29 22:38:29

标签: r missing-data bayesian bayesian-networks

我是这个社区的新手,r和一般的编程。 (提前感谢您的耐心等待!)我正在开展涉及贝叶斯网络的项目。

海峡问题。以下代码发布在此网站上,以回应标题为" bnlearn包R"中的NA / NaN值的问题;

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- cnNew(
      nodes = c("a1", "a2", "a3"),
      cats = list(c("1","2"), c("1","2"), c("1","2")),
      parents = list(NULL, c(1,3), NULL)
      )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')

##$a2
##         a1        a3         0         1
## A 0.0000000 0.0000000 0.0000000 1.0000000
## B 0.0000000 1.0000000 0.5712826 0.4287174
## A 1.0000000 0.0000000 0.0000000 1.0000000
## B 1.0000000 1.0000000 0.5685786 0.4314214

然而,当我复制,粘贴并运行代码时,我得到了不同的结果(见下文)。

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##   a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3 
## first generate a network with a1,a3 ->a2
cnet <- cnNew(
    nodes = c("a1", "a2", "a3"),
    cats = list(c("1","2"), c("1","2"), c("1","2")),
    parents = list(NULL, c(1,3), NULL)
    )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')
## $a2
##   a1  a3   1   2
## A 1.0 1.0 0.0 1.0
## B 1.0 2.0 0.5 0.5
## A 2.0 1.0 0.5 0.5
## B 2.0 2.0 0.5 0.5

有人可以解释为什么我的结果会有所不同吗?我问,因为我试图了解catnet如何处理丢失的数据。

最佳,

约翰

1 个答案:

答案 0 :(得分:0)

顶部/底部代码相同 - 它们应输出相同的结果。我通过catnet函数查看了使用相同功能的其他软件包 - 可能是您的问题。在使用非基本函数时,最好使用::表示法。

rm(list=ls())
library(catnet)

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- catnet::cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- catnet::cnNew(
  nodes = c("a1", "a2", "a3"),
  cats = list(c("1","2"), c("1","2"), c("1","2")),
  parents = list(NULL, c(1,3), NULL)
)


## set the empirical probabilities from data=my.data
cnet2 <- catnet::cnSetProb(cnet,data=my.data)

## to get the conditional probability table
catnet::cnProb(cnet2,which='a2')

# $a2
# a1  a3   1   2
# A 1.0 1.0 0.0 1.0
# B 1.0 2.0 0.5 0.5
# A 2.0 1.0 0.5 0.5
# B 2.0 2.0 0.5 0.5