R中多个群体的出现指数

时间:2018-05-05 08:56:18

标签: r

我有一个像这样的矩阵:

enter image description here

structure(list(Gene_ID = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L), .Label = c("g1", "g10", "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9"), class = "factor"), Module_Color = structure(c(3L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 2L, 1L), .Label = c("blue", "green", "red"), class = "factor")), .Names = c("Gene_ID", "Module_Color"), class = "data.frame", row.names = c(NA, -10L))

我想得到所有不同模块颜色出现的行索引,并创建一个列表“modIndices”,它将包含所有不同模块颜色的行索引,如下所示:

modIndices$red={1,3,5,8} 
#as red color appears in row 1,3,5 and 8.

modIndices$blue={2,6,10}

modIndices$green={4,7,9}

虽然我能够使用“which”函数获取特定颜色的索引,但我无法创建上面的列表。

请帮忙......

1 个答案:

答案 0 :(得分:2)

我们可以split第二列上的行序列获得list vector个索引

split(seq_len(nrow(df)), df[[2]])

或使用tidyverse,创建一个row_number()的序列列,按"Module Color"summarise分组,得到list' ind&# 39;

library(dplyr)
df %>% 
  mutate(rn = row_number()) %>% 
  group_by(`Module Color`) %>%
  summarise(ind = list(rn)) 

数据

df <- data.frame(`Gene ID` = paste0("g", 1:10), 
    `Module Color` = c('red', 'blue', 'red', 'green', 'red', 'blue', 
  'green', 'red', 'green', 'blue'),
    stringsAsFactors = FALSE, check.names = FALSE)