我有以下数据框:
df <- structure(list(cell_type = c("Adipocytes", "Astrocytes", "B cells"
), V1.x = structure(c(NA, 14L, 4L), .Label = c("alb", "beta-s",
"ccr2", "cd74", "cx3cr1", "fosb", "gria2", "gzma", "lck", "myh6",
"plp1", "ptgs2", "s100a9", "slc1a2", "ttr"), class = "factor"),
V2.x = structure(c(7L, 18L, 8L), .Label = c("1500015o10rik",
"apold1", "ccl5", "cd74", "coro1a", "cybb", "fabp4", "h2-aa",
"hpx", "mag", "ms4a4b", "myh7", "s100a8", "selplg", "slc4a1",
"smoc2", "snap25", "xist"), class = "factor"), V3.x = structure(c(8L,
1L, 6L), .Label = c("bcan", "coro1a", "crispld2", "csf1r",
"emcn", "h2-ab1", "itgb2", "lpl", "mal", "mt3", "myl2", "ngp",
"nkg7", "rhd", "s100a8", "serpina1a", "slc1a2", "tyrobp"), class = "factor")), row.names = c(NA,
3L), class = "data.frame")
它看起来像这样:
cell_type V1.x V2.x V3.x
1 Adipocytes <NA> fabp4 lpl
2 Astrocytes slc1a2 xist bcan
3 B cells cd74 h2-aa h2-ab1
我想要做的是将它们转换为以cell_type
为名称的命名向量列表,并且我还想删除<NA>
,得到:
$Adipocytes
fabp4 lpl
$Astrocytes
slc1a2 xist bcan
$`B cells`
cd74 h2-aa h2-ab1
我该如何实现?
我对此感到困惑:lapply(group_split(df, cell_type), as.vector)
答案 0 :(得分:4)
我们可以使用split
根据cell_type
进行拆分,然后使用lapply
删除NA
值
lapply(split(df[-1], df$cell_type), function(x) x[!is.na(x)])
#$Adipocytes
#[1] "fabp4" "lpl"
#$Astrocytes
#[1] "slc1a2" "xist" "bcan"
#$`B cells`
#[1] "cd74" "h2-aa" "h2-ab1"
使用dplyr
和purrr
的变体可能是使用group_split
根据每个列表中的cell_type
,discard
NA
值进行拆分并使用setNames
分配名称。
library(dplyr)
library(purrr)
df %>%
mutate_all(as.character) %>%
group_split(cell_type, keep = FALSE) %>%
map(~discard(flatten_chr(.), is.na)) %>%
setNames(df$cell_type)
答案 1 :(得分:1)
我们可以使用base R
setNames(apply(df[-1], 1, function(x) unname(x)[complete.cases(x)]), df[[1]])
#$Adipocytes
#[1] "fabp4" "lpl"
#$Astrocytes
#[1] "slc1a2" "xist" "bcan"
#$`B cells`
#[1] "cd74" "h2-aa" "h2-ab1"