我有一个涉及100人的数据集,他们诊断出5种疾病。可能会出现任何条件组合,但我已将其设置为条件D的概率取决于条件A,而E取决于B.
set.seed(14)
numpeople <- 100
diagnoses <- data.frame(A=rbinom(100, 1, .15),
B=rbinom(100, 1, .1),
C=rbinom(100, 1, .2)
)
# Probability of diagnosis for D increases by .4 if patient has A, otherwise .5
diagnoses$D <- sapply(diagnoses$A, function(x) rbinom(1, 1, .4*x+.2))
# Probability of diagnosis for E increases by .3 if patient has B, otherwise rare
diagnoses$E <- sapply(diagnoses$B, function(x) rbinom(1, 1, .7*x+.1))
要制作一个共生矩阵,其中每个单元格是行和列中同时具有诊断结果的人数,我使用矩阵代数:
diagnoses.dist <- t(as.matrix(diagnoses))%*%as.matrix(diagnoses)
diag(diagnoses.dist) <- 0
diagnoses.dist
> diagnoses.dist
A B C D E
A 0 1 1 11 3
B 1 0 0 1 7
C 1 0 0 5 4
D 11 1 5 0 4
E 3 7 4 4 0
然后,我想使用和弦图来显示每个诊断的共诊断比例。
circos.clear()
circos.par(gap.after=10)
chordDiagram(diagnoses.dist, symmetric=TRUE)
默认情况下,为每个组分配的扇区(饼图片段)大小与链接数成比例。
> colSums(diagnoses.dist) #Number of links related to each diagnosis
A B C D E
16 9 10 21 18
是否可以设置扇区宽度来说明每次诊断的人数?
> colSums(diagnoses) #Number of people with each diagnosis
A B C D E
16 8 20 29 18
这个问题似乎与circlize书的section 14.5有些相关,但我不确定如何处理gap.after
参数的数学运算。
基于circlize book的section 2.3,我尝试使用circos.initalize
设置扇区大小,但我认为chordDiagram
函数会覆盖这个,因为外部的比例完全相同
circos.clear()
circos.par(gap.after=10)
circos.initialize(factors=names(diagnoses), x=colSums(diagnoses)/sum(diagnoses), xlim=c(0,1))
chordDiagram(diagnoses.dist, symmetric=TRUE)
我看到很多选项可以调整chordDiagram
中的曲目,但对于扇区来说并不多。有没有办法可以做到这一点?
答案 0 :(得分:1)
在您的情况下,该类别中的人数有时可能小于与其他类别共现的总人数。例如,B类共有9个共现,但人数只有8个。
如果这不是问题,您可以在矩阵图上放置一些值,这些值对应于仅保留在一个类别中的人数。在下面的示例代码中,我只是在图中添加随机数来说明这个想法:
diagnoses.dist <- t(as.matrix(diagnoses))%*%as.matrix(diagnoses)
diag(diagnoses.dist) = sample(10, 5)
# since the matrix is symmetric, we set the uppper diagnal to zero.
# we don't use `symmetrix = TRUE` here because the values on the diagonal
# are still used.
diagnoses.dist[upper.tri(diagnoses.dist)] = 0
par(mfrow = c(1, 2))
# here you can remove `self.link = 1` to see the difference
chordDiagram(diagnoses.dist, grid.col = 2:6, self.link = 1)
# If you don't want to see the "mountains"
visible = matrix(TRUE, nrow = nrow(diagnoses.dist), ncol = ncol(diagnoses.dist))
diag(visible) = FALSE
chordDiagram(diagnoses.dist, grid.col = 2:6, self.link = 1, link.visible = visible)
PS:link.visible
选项仅适用于最新版本的circlize。