Question

我有以下几点：

library(tidyverse)
df <- tibble::tribble(
  ~gene, ~celltype,
  "a",   "cel1_1",  
  "b",   "cel1_1",  
  "c",   "cel1_1",  
  "a",   "cell_2",  
  "b",   "cell_2",  
  "c",   "cell_3",  
  "d",   "cell_3"
)

df %>% group_by(celltype)
#> Source: local data frame [7 x 2]
#> Groups: celltype [3]
#> 
#> # A tibble: 7 x 2
#>    gene celltype
#>   <chr>    <chr>
#> 1     a   cel1_1
#> 2     b   cel1_1
#> 3     c   cel1_1
#> 4     a   cell_2
#> 5     b   cell_2
#> 6     c   cell_3
#> 7     d   cell_3

我以前用它来制作两个摘要。首先是基因之间的重叠对于每个细胞类型对：

celltype_pair_gene_overlap <- crossprod(table(df))
celltype_pair_gene_overlap 
#>         celltype
#> celltype cel1_1 cell_2 cell_3
#>   cel1_1      3      2      1
#>   cell_2      2      2      0
#>   cell_3      1      0      2

其次是每种细胞类型的基因计数

celltype_gene_count <- df %>% group_by(celltype) %>% summarise(nof_genes = n())
celltype_gene_count
#> # A tibble: 3 x 2
#>   celltype nof_genes
#>      <chr>     <int>
#> 1   cel1_1         3
#> 2   cell_2         2
#> 3   cell_3         2

我想要做的是划分celltype_pair_gene_overlap中的每个值基于celltype_gene_count中的查找元素作为分母。

导致此表：

   celltype    cel1_1        cell_2          cell_3
   cel1_1      1.00  (3/3)   0.67 (2/3)      0.33 (1/3)
   cell_2      1.00  (2/2)   1.00 (2/2)      0    (0/2)
   cell_3      0.5   (1/2)   0    (0/2)      1    (2/2)

我如何在基础R或（最好）dplyr中实现这一目标？

Answer 1

我们可以使用match获取数字索引，根据获取nof_genes，复制并除以

celltype_pair_gene_overlap/celltype_gene_count$nof_genes[
   match(row.names(celltype_pair_gene_overlap), celltype_gene_count$celltype)
        ][row(celltype_pair_gene_overlap)]

注意：这是基于＆＃39; celltype＆＃39;总是不在同一个顺序。如果顺序相同，则可以进行简单的划分。

Answer 2

您可以直接进行划分，因为table和group_by应该将细胞类型的级别设置为相同的顺序...

celltype_pair_gene_overlap / celltype_gene_count$nof_genes

        celltype
celltype cel1_1    cell_2    cell_3
  cel1_1    1.0 0.6666667 0.3333333
  cell_2    1.0 1.0000000 0.0000000
  cell_3    0.5 0.0000000 1.0000000

如何从查找表中划分为R

2 个答案: