Question

我有一个数据框，其中包含技术/生物技术几个领域的国家发布数量以及与其他领域一致的发布数量。我希望创建一个热图，以显示这些字段的交集（以发帖数量计）以及这些“重复项”的比例。也就是说，数据框本身看起来类似于：

df <- data.frame(matrix(nrow=4, byrow=TRUE, data=c(14000, 3300, 
2500, 1000, 3300, 3300, 700, 300, 2500, 700, 95000,7500, 1000, 300, 7500, 108000)))

colnames(df) <- rownames(df) <- c("ML & Image", "Software Dev", "Cloud Dev", "Bioinformatics & Health")

因此，例如，第一行将以ML＆Image职位发布的总数开头，然后是同样满足成为软件开发人员条件的ML＆Image职位发布的数量，然后是ML＆Image职位的数量。满足成为Cloud Developers等条件的图像作业发布。

如果要在R控制台中查看df表并保持发布的数值，我想制作一个看起来像数据框的热图，但该热图由不同字段之间的重叠比例来着色。因此，如果重叠很少，则将颜色显示为红色（ish），如果重叠大约为30-60％，则将颜色显示为黄色（ish），如果重叠很多，则将颜色显示为绿色（ish），并在一侧带有一个彩条作为参考。 / p>

对此，我们将提供任何帮助。谢谢！

Answer 1

不确定我是否完全理解您的要求，但是以下内容可能会给您一些想法。

> library(ggplot2)
> library(reshape2)

# Setup the data                                                                                                                                                                                                                                                              

> df <- data.frame(matrix(nrow=4, byrow=TRUE, data=c(14000, 3300, 2500, 1000, 3300, 3300, 700, 300, 2500, 700, 95000,7500, 1000, 300, 7500, 108000)))
> colnames(df) <- rownames(df) <- c("ML & Image", "Software Dev", "Cloud Dev", "Bioinformatics & Health")

> df
                        ML & Image Software Dev Cloud Dev Bioinformatics & Health
ML & Image                   14000         3300      2500                    1000
Software Dev                  3300         3300       700                     300
Cloud Dev                     2500          700     95000                    7500
Bioinformatics & Health       1000          300      7500                  108000

# Convert df to matrix and divide each column by the diagonal value                                                                                                                                                                                                           

> m <- data.matrix(df)
> m <- m / matrix(t(colSums(diag(4) * m)), nrow=4, ncol=4, byrow=TRUE)

> m
                        ML & Image Software Dev   Cloud Dev Bioinformatics & Health
ML & Image              1.00000000   1.00000000 0.026315789             0.009259259
Software Dev            0.23571429   1.00000000 0.007368421             0.002777778
Cloud Dev               0.17857143   0.21212121 1.000000000             0.069444444
Bioinformatics & Health 0.07142857   0.09090909 0.078947368             1.000000000

# Prepare data for ggplot2 by melting the matrix data in long data and                                                                                                                                                                                                        
# add the posting counts back in to be used as labels                                                                                                                                                                                                                         

> hm <- melt(m)
> hm$postings <- c(df[,1],df[,2],df[,3],df[,4])

> hm
                      Var1                    Var2       value postings
1               ML & Image              ML & Image 1.000000000    14000
2             Software Dev              ML & Image 0.235714286     3300
3                Cloud Dev              ML & Image 0.178571429     2500
4  Bioinformatics & Health              ML & Image 0.071428571     1000
5               ML & Image            Software Dev 1.000000000     3300
6             Software Dev            Software Dev 1.000000000     3300
7                Cloud Dev            Software Dev 0.212121212      700
8  Bioinformatics & Health            Software Dev 0.090909091      300
9               ML & Image               Cloud Dev 0.026315789     2500
10            Software Dev               Cloud Dev 0.007368421      700
11               Cloud Dev               Cloud Dev 1.000000000    95000
12 Bioinformatics & Health               Cloud Dev 0.078947368     7500
13              ML & Image Bioinformatics & Health 0.009259259     1000
14            Software Dev Bioinformatics & Health 0.002777778      300
15               Cloud Dev Bioinformatics & Health 0.069444444     7500
16 Bioinformatics & Health Bioinformatics & Health 1.000000000   108000

# Plot it                                                                                                                                                                                                                                                                     

> ggplot(hm, aes(x=Var1, y=Var2)) +
        geom_tile(aes(fill=value)) +
        scale_fill_gradientn(colours=c("red","yellow","green")) +
        geom_text(aes(label=postings))

这将导致：

如何在R中使用比例和数值创建热图

1 个答案: