我正在写一段R代码并且卡住了。
背景(解决问题不是必需的):我通过乘以独立的边际分布来计算联合概率。边缘概率向量由ProbGenerationProcess()迭代生成。在每次迭代时,它将输出一个向量,例如
Iteration 1:
Color =
Blue Green
0.2 0.8
Iteration 2:
Material =
Cotton Silk
0.7 0.3
Iteration 3:
Country =
China USA
0.6 0.4
......
期望的结果:我希望得到的联合概率是每个边际向量中每个元素的乘积。格式应如下所示。
Color Material Country Prob
Blue Cotton China 0.084 (= 0.2*0.7*0.6)
Blue Cotton USA 0.056 (= 0.2*0.7*0.4)
Blue Silk China 0.036 (= 0.2*0.3*0.6)
Blue Silk USA ..
Green Cotton China ..
Green Cotton USA ..
... ... ... ...
我的实施:以下是我的代码:
joint.names = NULL # data.from store the marginal value names
joint.probs = NULL # store probabilities.
for (i in iterations) {
marginal = ProbGenerationProcess(VarUniqueToIteration) # output is numeric with names
if ( is.null(joint.names) ) {
# initialize the dataframes
joint.names = names(marginal)
joint.probs = marginal
} else {
# (my hope:) iteratively populate the joint.names and joint.probs
joint.names = expand.grid(joint.names, names(marginal))
expanded.prob = expand.grid(joint.probs, marginal)
joint.probs = expanded.prob$Var1 * expanded.prob$Var2 # Row-by-row multiplication.
}
}
输出:Joint.probs输出总是正确的,但是,joint.names并不像我想要的那样工作。在前两次迭代之后,一切运行良好。我得到了:
joint.names =
Var1 Var2
1 Blue Cotton
2 Green Cotton
3 Blue Silk
4 Green Silk
... ...
从第三次迭代开始,它变得有问题:
joint.names =
Var1.Var1 Var1.Var2 Var1.Var1.1 Var1.Var2.1 Var2
1 Blue Cotton Blue Cotton China
2 Green Cotton Green Cotton China
3 Blue Silk Blue Silk USA
4 Green Silk Green Silk USA
我想我的第一个问题是:这是获得我想要的结果的最有效方法吗?如果是这样,expand.grid()是我应该使用的函数,我应该如何正确地初始化它?
感谢任何帮助!
答案 0 :(得分:2)
合并是你的朋友。
color <- data.frame(color=c("blue","green"),prob=c(0.2,0.8))
material <- data.frame(material=c("cotton","silk"),prob=c(0.7,0.3))
country <- data.frame(country=c("china","usa"),prob=c(0.6,0.4))
dat <- merge(merge(color[1],material[1]),country[1]) # get names first
# same as: expand.grid(c("china","usa"),c("cotton","silk"),c("blue","green"))
dat <- merge(dat, color, by="color")
dat <- merge(dat, material, by="material")
dat <- merge(dat, country, by="country")
dat$joint <- dat$prob.x * dat$prob.y * dat$prob # joint calc
dat <- dat[-grep("^prob",colnames(dat))] # cleanup extra probs
结果:
country material color joint
1 china cotton blue 0.084
2 china cotton green 0.336
3 china silk blue 0.036
4 china silk green 0.144
5 usa cotton blue 0.056
6 usa cotton green 0.224
7 usa silk blue 0.024
8 usa silk green 0.096
答案 1 :(得分:1)
为简单起见如何(尽管性能是一个问题,合并可能会更好)
PROBS<-data.frame(Item=rep(c("Color","Material","Country"),each=2),
Value=c("Blue","Green","Cotton","Silk","China","USA"),
Prob=c(0.2,0.8,0.7,0.3,0.6,0.4))
rownames(PROBS)<-PROBS$Value
GRID<-expand.grid(by(PROBS,PROBS$Item,function(x)x["Value"]))
GRID$probs<-apply(GRID,1,function(x)prod(PROBS[c(x),"Prob"]))
GRID
# Color Country Material probs
#1 Blue China Cotton 0.084
#2 Green China Cotton 0.336
#3 Blue USA Cotton 0.056
#4 Green USA Cotton 0.224
#5 Blue China Silk 0.036
#6 Green China Silk 0.144
#7 Blue USA Silk 0.024
#8 Green USA Silk 0.096