在指定向量的每个元素处拆分字符串

时间:2015-04-18 16:24:24

标签: r string

我正在努力解决一个问题,我确信有一个简单的解决方案,但我一直无法找到它。谢谢你的帮助。

每当发生单独的矢量元素时,我都会尝试拆分一串文本。如下所示:

fruits<-c("APPLE","BANANA","ORANGE")
string<-("This is a list of fruits and their properties. 
         APPLE This is a red fruit, typically very SWEET! 
         BANANA This is a yellow fruit, also sweet! 
         ORANGE This is an orange fruit and also, yes, sweet")

我想要的输出是4个元素的列表/向量,每个元素包含在'fruits'的任何元素出现之前/之后的字符串的分割。所以,像:

c("This is a list of fruits and their properties",
"APPLE This is a red fruit, typically very SWEET!",
"BANANA This is a yellow fruit, also sweet!,
"ORANGE This is an orange fruit and also, yes, sweet")

我试过了

strsplit(string,split=fruits)

除了其他几件事,但没有成功。我实际上要做的是将我已经转换为.txt的.pdf代码簿分成一个单词列表(国家/地区),它们对应于代码簿的各个部分。

提前致谢!

2 个答案:

答案 0 :(得分:3)

“我真的不想考虑正则表达式”的方式就是这样做:

strsplit(gsub(sprintf('(%s)', paste(fruits, collapse = "|")), 
              "MYSPLIT\\1", string), 
         "MYSPLIT", TRUE)[[1]]
# [1] "This is a list of fruits and their properties. \n         "  
# [2] "APPLE This is a red fruit, typically very SWEET! \n         "
# [3] "BANANA This is a yellow fruit, also sweet! \n         "      
# [4] "ORANGE This is an orange fruit and also, yes, sweet"         

在那里,我基本上匹配了APPLE,ORANGE和BANANA,并用MYSPLITAPPLE等替换它们,给我一个新的分隔符(MYSPLIT),在其上分割字符串。

答案 1 :(得分:2)

您可以使用正则表达式lookarounds

 strsplit(string, sprintf('\\s+(?=%s)',
            paste(fruits, collapse='|')), perl=TRUE)[[1]]
 #[1] "This is a list of fruits and their properties."     
 #[2] "APPLE This is a red fruit, typically very SWEET!"   
 #[3] "BANANA This is a yellow fruit, also sweet!"         
 #[4] "ORANGE This is an orange fruit and also, yes, sweet"