假设我有以下字符串:
s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"
我想恢复";"
和"="
之间的字符串以获得以下输出:
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
我可以将strsplit()
与多个拆分元素一起使用吗?
答案 0 :(得分:16)
1)strsplit with matrix 试试这个:
> matrix(strsplit(s, "[;=]")[[1]], 2)[2,]
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
2)strsplit with gsub 或strsplit
使用gsub
:
> strsplit(gsub("[^=;]+=", "", s), ";")[[1]]
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
3)strsplit with sub 或strsplit
使用sub
:
> sub(".*=", "", strsplit(s, ";")[[1]])
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
4)strapplyc 或者在等号后提取连续的非分号:
> library(gsubfn)
> strapplyc(s, "=([^;]+)", simplify = unlist)
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
添加了额外的strplit
解决方案。
答案 1 :(得分:1)
我知道这是一个老问题,但是我发现环视正则表达式在解决此问题方面非常出色:
library(stringr)
your_string <- '/this/file/name.txt'
result <- str_extract(string = your_string, pattern = "(?<=/)[^/]*(?=\\.)")
result
换句话说
(?<=...)
部分看上去{strong>之前是...
的所需字符串(在本例中为正斜杠)。 [^/]*
连续查找不加正斜杠的字符(在本例中为name.txt
)。(?=...)
会为...
(在这种情况下是特殊的句点字符,需要转义为\\.
)之后查找所需的字符串。这也适用于数据帧:
library(dplyr)
strings <- c('/this/file/name1.txt', 'tis/other/file/name2.csv')
df <- as.data.frame(strings) %>%
mutate(name = str_extract(string = strings, pattern = "(?<=/)[^/]*(?=\\.)"))
# Optional
names <- df %>% pull(name)
或者,就您而言:
your_string <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"
result <- str_extract(string = your_string, pattern = "(?<=;Alias=)[^;]*(?=;)")
result # Outputs 'MIMAT0027618'