R:返回一行中所有匹配值的列名

时间:2017-03-29 15:16:16

标签: r

我有一个数据框( hits_map ),其中包含针对每个基因(列)中结合位点的基因(行)列表。这些值表明每个基因内有多少个位点,NA为0。

这是一个小子集,因为实际数据框要大得多:

         AscG Dan.4 IclR.3 MraZ.1
afaE      NA     1     NA      1
afaF      NA    NA     NA     NA
agn43.1    1    NA      1     NA
agn43.2    1    NA     NA     NA
agn43.3    1    NA     NA     NA
chuA      NA    NA     NA      1
csgA       1    NA     NA      1
csgB      NA    NA     NA     NA
csgC      NA    NA     NA     NA`

对于每一列,我想获得一个具有值的绑定站点/列名列表,然后我可以使用它来从相应的数据框 nameseq 中提取行,以获得更多信息。

目前我使用以下命令逐行执行此操作,使用函数 remove_zero_cols 删除值0,但我希望能够通过输入数据为每一行执行此操作。帧。

vec <- hits_map[row,]
vec <- remove_zero_cols(vec)
vec <- colnames(vec)
nameseq[nameseq$Name %in% vec,]

关于如何解决这个问题的任何建议?

1 个答案:

答案 0 :(得分:1)

一种方法是将数据帧逐行转换为单个向量,并根据您要查找的值创建一个逻辑向量,确保将FALSE转换为{{1} }。然后创建一个重复列名称的向量,其长度与逻辑向量,子集相同,并重新转换为矩阵:

NA

现在,您可以在原始数据框中使用相同的行来获取每行的相应列名。 我希望有人可以发布> set.seed(1) > DF = data.frame(first = sample(c(NA,1), 5, T), second = sample(c(NA,1), 5, T), + third = sample(c(NA,1), 5, T), fourth = sample(c(NA,1), 5, T), + fifth = sample(c(NA,1), 5, T)) > DF first second third fourth fifth 1 NA 1 NA NA 1 2 NA 1 NA 1 NA 3 1 1 1 1 1 4 1 1 NA NA NA 5 NA NA 1 1 NA > DFvector = as.vector(t(DF)) > DFvector [1] NA 1 NA NA 1 NA 1 NA 1 NA 1 1 1 1 1 1 1 NA NA NA NA NA 1 1 NA # Create a repeated vector of column names > columnNames = rep(colnames(DF), times = nrow(DF)) > myNames = columnNames[as.logical(DFvector)] > myNames[is.na(myNames)] = "" > myNames [1] "" "second" "" "" "fifth" "" "second" "" "fourth" "" "first" [12] "second" "third" "fourth" "fifth" "first" "second" "" "" "" "" "" [23] "third" "fourth" "" # Convert to matrix, by row myMatrix = matrix(myNames, ncol = ncol(DF), byrow = T) # Can group per row, by using assertr package > library(assertr) > library(stringr) > concat = assertr::col_concat(myMatrix[], sep = " ") > concat [1] " second fifth" " second fourth " "first second third fourth fifth" [4] "first second " " third fourth " > noWS = trimws(concat) > noWS [1] "second fifth" "second fourth" "first second third fourth fifth" [4] "first second" "third fourth" > noS = gsub(pattern = "\\s+", replacement = " ", x = noWS) > noS [1] "second fifth" "second fourth" "first second third fourth fifth" [4] "first second" "third fourth" > stringr::str_split(noS, " ", simplify = T) [,1] [,2] [,3] [,4] [,5] [1,] "second" "fifth" "" "" "" [2,] "second" "fourth" "" "" "" [3,] "first" "second" "third" "fourth" "fifth" [4,] "first" "second" "" "" "" [5,] "third" "fourth" "" "" "" / data.table替代方案,因为如果要避免使用dplyr,这将非常繁琐。