用circlize设置和弦图的扇区宽度

时间:2017-07-13 20:52:32

标签: r chord-diagram circlize

我有一个涉及100人的数据集,他们诊断出5种疾病。可能会出现任何条件组合,但我已将其设置为条件D的概率取决于条件A,而E取决于B.

set.seed(14)
numpeople <- 100
diagnoses <- data.frame(A=rbinom(100, 1, .15), 
                        B=rbinom(100, 1, .1),
                        C=rbinom(100, 1, .2)
                        )
# Probability of diagnosis for D increases by .4 if patient has A, otherwise .5
diagnoses$D <- sapply(diagnoses$A, function(x) rbinom(1, 1, .4*x+.2))
# Probability of diagnosis for E increases by .3 if patient has B, otherwise rare
diagnoses$E <- sapply(diagnoses$B, function(x) rbinom(1, 1, .7*x+.1))

要制作一个共生矩阵,其中每个单元格是行和列中同时具有诊断结果的人数,我使用矩阵代数:

diagnoses.dist <- t(as.matrix(diagnoses))%*%as.matrix(diagnoses)
diag(diagnoses.dist) <- 0
diagnoses.dist
> diagnoses.dist
   A B C  D E
A  0 1 1 11 3
B  1 0 0  1 7
C  1 0 0  5 4
D 11 1 5  0 4
E  3 7 4  4 0

然后,我想使用和弦图来显示每个诊断的共诊断比例。

circos.clear()
circos.par(gap.after=10)
chordDiagram(diagnoses.dist, symmetric=TRUE)

Example Chord diagram with 5 groups

默认情况下,为每个组分配的扇区(饼图片段)大小与链接数成比例。

> colSums(diagnoses.dist) #Number of links related to each diagnosis
 A  B  C  D  E 
16  9 10 21 18 

是否可以设置扇区宽度来说明每次诊断的人数?

> colSums(diagnoses) #Number of people with each diagnosis
 A  B  C  D  E 
16  8 20 29 18 

这个问题似乎与circlize书的section 14.5有些相关,但我不确定如何处理gap.after参数的数学运算。

基于circlize book的section 2.3,我尝试使用circos.initalize设置扇区大小,但我认为chordDiagram函数会覆盖这个,因为外部的比例完全相同

circos.clear()
circos.par(gap.after=10)
circos.initialize(factors=names(diagnoses), x=colSums(diagnoses)/sum(diagnoses), xlim=c(0,1))
chordDiagram(diagnoses.dist, symmetric=TRUE)

enter image description here

我看到很多选项可以调整chordDiagram中的曲目,但对于扇区来说并不多。有没有办法可以做到这一点?

1 个答案:

答案 0 :(得分:1)

在您的情况下,该类别中的人数有时可能小于与其他类别共现的总人数。例如,B类共有9个共现,但人数只有8个。

如果这不是问题,您可以在矩阵图上放置一些值,这些值对应于仅保留在一个类别中的人数。在下面的示例代码中,我只是在图中添加随机数来说明这个想法:

diagnoses.dist <- t(as.matrix(diagnoses))%*%as.matrix(diagnoses)
diag(diagnoses.dist) = sample(10, 5)

# since the matrix is symmetric, we set the uppper diagnal to zero.
# we don't use `symmetrix = TRUE` here because the values on the diagonal
# are still used.
diagnoses.dist[upper.tri(diagnoses.dist)] = 0

par(mfrow = c(1, 2))
# here you can remove `self.link = 1` to see the difference
chordDiagram(diagnoses.dist, grid.col = 2:6, self.link = 1)

# If you don't want to see the "mountains"
visible = matrix(TRUE, nrow = nrow(diagnoses.dist), ncol = ncol(diagnoses.dist))
diag(visible) = FALSE
chordDiagram(diagnoses.dist, grid.col = 2:6, self.link = 1, link.visible = visible)

enter image description here

PS:link.visible选项仅适用于最新版本的circlize。