Question

我有一个包含更多45类药物的大型数据集。如何找出给定AC的类别AB的条件概率对于每个id都存在。 1,2,3等是唯一的ID。

我的数据集看起来像

ID1.AB AD AC FG AB DC GM AC
ID2.AB AC DG GM
ID3.AB DG GM AC

我们可以在R中执行此操作。我尝试在R中使用prob函数，但它给了我一个错误。

PS：类别不一定是连续的。我认为每个类别对于每个id都是唯一的，无论它出现的次数如何。

Answer 1

我可能会对你要找的东西感到困惑，但简单的条件概率似乎如下：

# Create Dataset
mystring = c("AD", "AC", "BD", "DC")

k = NULL
for(i in 1:45){
  samp = sample(mystring, 3, replace = T)
  k = c(k,paste(samp, collapse = " ") )
}

st = data.frame(1:45, k, stringsAsFactors = F)

library(stringr)

# Number of strings that contain both groups or 
# occurances of the intersection
alpha = str_detect(st[,2],"AC") & str_detect(st[,2],"AD") 

# Occurances of AC
beta = str_detect(st[,2], "AC")

# P(A \ B) / P(B)
(sum(alpha)/45) / 
  (sum(beta)/45)

此外，我假设根据您提供的数据样本存储为字符串的类别。

R

1 个答案: