Question

假设我有下一个字符串。

"col1 xx > col2 xx xx > col3 > col4 xx xx > col5"

How can I extract the first word before the first ">" (col1)
Or the first word before the second ">" (col2)
Or the first word before the third ">" (col3) ..

Answer 1

假设输入是

x <- "col1 xx > col2 xx xx > col3 > col4 xx xx > col5"

然后是一些替代方案：

1）strsplit 在空格上拆分字符串，后跟任意字符的最短字符串，后跟＆gt;和一个空间。没有包使用。

strsplit(x, " .*?> ")[[1]]
## [1] "col1" "col2" "col3" "col4" "col5"

2）strapply 这会重复匹配单词"(\\w+)"，然后是最短的字符序列".*?"，直到＆gt;或结束"(>|$)"返回单词。

library(gsubfn)

strapply(x, "(\\w+).*?(>|$)", perl = TRUE)[[1]]
## [1] "col1" "col2" "col3" "col4" "col5"

3）strapplyc 如果我们知道我们想要的单词全部由小写字母后跟数字组成，并且没有其他单词存在，那么这将起作用：

library(gsubfn)

strapplyc(x, "[a-z]+\\d+")[[1]]
## [1] "col1" "col2" "col3" "col4" "col5"

R - 提取词中的正则表达式

1 个答案: