我有一个包含多个向量的列表,如下所示:
$`56`
[1] "OTU2998" "UniRef90_A0A1Z9FS94" "UniRef90_A0A257ESC3"
[4] "UniRef90_A0A293NAV3" "UniRef90_A0A2E1NMU8" "UniRef90_A0A2E1NPX9"
[7] "UniRef90_A0A2E1NQL1" "UniRef90_A0A2E1NRD2" "UniRef90_X0UC66"
$`57`
[1] "OTU3820" "UniRef90_A0A1Z9H3N2" "UniRef90_A0A2D5I161"
[4] "UniRef90_A0A2E6PRN5"
$`58`
[1] "OTU4452" "UniRef90_A0A1Z9KBI8" "UniRef90_A0A2E1VTI6"
[4] "UniRef90_A0A2G2KCN6" "UniRef90_UPI000BFEC744"
$`59`
[1] "OTU0245" "UniRef90_A0A1Z9MPM9" "UniRef90_A0A2E2ME98"
[4] "UniRef90_A0A2E8X9N7"
是否可以仅提取“ OTUXXX”信息?我的意思是,我想得到这样的东西:
$`56`
[1] "OTU2998"
$`57`
[1] "OTU3820"
$`58`
[1] "OTU4452"
$`59`
[1] "OTU0245"
答案 0 :(得分:2)
我喜欢purrr::map
函数家族,因为它们易于传递函数和参数。提取这些元素的两个快速选择是使用grep
使用value = T
返回匹配的字符串,而不仅仅是返回它们的索引,或者使用stringr::str_subset
返回相同的字符串。
此处的正则表达式匹配以“ OTU”开头,后跟1个或多个数字的字符串。
这两种方法一次都可以缩放多个匹配项:我在最后一个列表元素中添加了一个项目“ OTU1234”来说明这一点。
dl <- list(
`56` = c("OTU2998", "UniRef90_A0A1Z9FS94", "UniRef90_A0A257ESC3", "UniRef90_A0A293NAV3", "UniRef90_A0A2E1NMU8", "UniRef90_A0A2E1NPX9", "UniRef90_A0A2E1NQL1", "UniRef90_A0A2E1NRD2", "UniRef90_X0UC66"),
`57` = c("OTU3820", "UniRef90_A0A1Z9H3N2", "UniRef90_A0A2D5I161", "UniRef90_A0A2E6PRN5"),
`58` = c("OTU4452", "UniRef90_A0A1Z9KBI8", "UniRef90_A0A2E1VTI6", "UniRef90_A0A2G2KCN6", "UniRef90_UPI000BFEC744"),
`59` = c("OTU0245", "UniRef90_A0A1Z9MPM9", "UniRef90_A0A2E2ME98", "UniRef90_A0A2E8X9N7", "OTU1234")
)
purrr::map(dl, ~grep("^OTU\\d+$", ., value = T))
#> $`56`
#> [1] "OTU2998"
#>
#> $`57`
#> [1] "OTU3820"
#>
#> $`58`
#> [1] "OTU4452"
#>
#> $`59`
#> [1] "OTU0245" "OTU1234"
purrr::map(dl, stringr::str_subset, "^OTU\\d+$")
# same output as above
答案 1 :(得分:1)
我们可以遍历list
并提取与字符串开头(^
)的子字符串'OTU'相匹配的元素,后跟四位数字(\\d{4}
),直到带有$
grepl
)
lapply(lst1, function(x) x[grepl("^OTU\\d{4}$", x)])
#$`56`
#[1] "OTU2998"
#$`57`
#[1] "OTU3820"
#$`58`
#[1] "OTU4452"
#$`59`
#[1] "OTU0245" "OTU1234"
注意:仅使用base R
方法
或者,如果我们是整洁的迷,请使用keep
library(tidyverse)
map(lst1, keep, str_detect, '^OTU\\d{4}$')
lst1 <- list(
`56` = c("OTU2998", "UniRef90_A0A1Z9FS94", "UniRef90_A0A257ESC3", "UniRef90_A0A293NAV3", "UniRef90_A0A2E1NMU8", "UniRef90_A0A2E1NPX9", "UniRef90_A0A2E1NQL1", "UniRef90_A0A2E1NRD2", "UniRef90_X0UC66"),
`57` = c("OTU3820", "UniRef90_A0A1Z9H3N2", "UniRef90_A0A2D5I161", "UniRef90_A0A2E6PRN5"),
`58` = c("OTU4452", "UniRef90_A0A1Z9KBI8", "UniRef90_A0A2E1VTI6", "UniRef90_A0A2G2KCN6", "UniRef90_UPI000BFEC744"),
`59` = c("OTU0245", "UniRef90_A0A1Z9MPM9", "UniRef90_A0A2E2ME98", "UniRef90_A0A2E8X9N7", "OTU1234")
)