从因子或数据框中提取内容

时间:2015-08-23 12:26:01

标签: r gsub

以下是X的一个例子 - 一个因子(它是数据帧的一部分):

[1] "[[1]]"              "J48"                "------------------" ""                   "MSTV"              
 [6] "|"                  "|"                  "|"                  "|"                  "|"                 
[11] "|"                  "|"                  "|"                  "MSTV"               "|"                 
[16] "|"                  "|"                  "|"                  "|"                  "|"                 
[21] "|"                  "|"                  "|"                  "|"                  "|"                 
[26] "|"                  "|"                  "|"                  "|"                  "|"                 
[31] "|"                  "|"                  "|"                  "|"                  ""                  
[36] "Number"             ""                   "Size"               ""                   "like"              
[41] ""                   "The"  

我想提取单词MSTV(出现两次)。我想忽略所有其他的话和|迹象。 MSTV伴随着|它出现之前和之后的符号。我试着使用命令: gsub(“[A-Z] [1-9]:”,“”,X) 没有成功。如何提取单词MSTV(可能出现在|符号之间的中间位置?

1 个答案:

答案 0 :(得分:3)

认为你的意思是,

library(stringr)
x <- c("|","MSTV","|","s","",":")
str_extract(paste0(x, collapse=""), perl("(?<=\\|)[A-Za-z]+(?=\\|)"))
#[1] "MSTV"