我正在努力解决一个问题,我确信有一个简单的解决方案,但我一直无法找到它。谢谢你的帮助。
每当发生单独的矢量元素时,我都会尝试拆分一串文本。如下所示:
fruits<-c("APPLE","BANANA","ORANGE")
string<-("This is a list of fruits and their properties.
APPLE This is a red fruit, typically very SWEET!
BANANA This is a yellow fruit, also sweet!
ORANGE This is an orange fruit and also, yes, sweet")
我想要的输出是4个元素的列表/向量,每个元素包含在'fruits'的任何元素出现之前/之后的字符串的分割。所以,像:
c("This is a list of fruits and their properties",
"APPLE This is a red fruit, typically very SWEET!",
"BANANA This is a yellow fruit, also sweet!,
"ORANGE This is an orange fruit and also, yes, sweet")
我试过了
strsplit(string,split=fruits)
除了其他几件事,但没有成功。我实际上要做的是将我已经转换为.txt的.pdf代码簿分成一个单词列表(国家/地区),它们对应于代码簿的各个部分。
提前致谢!
答案 0 :(得分:3)
“我真的不想考虑正则表达式”的方式就是这样做:
strsplit(gsub(sprintf('(%s)', paste(fruits, collapse = "|")),
"MYSPLIT\\1", string),
"MYSPLIT", TRUE)[[1]]
# [1] "This is a list of fruits and their properties. \n "
# [2] "APPLE This is a red fruit, typically very SWEET! \n "
# [3] "BANANA This is a yellow fruit, also sweet! \n "
# [4] "ORANGE This is an orange fruit and also, yes, sweet"
在那里,我基本上匹配了APPLE,ORANGE和BANANA,并用MYSPLITAPPLE等替换它们,给我一个新的分隔符(MYSPLIT),在其上分割字符串。
答案 1 :(得分:2)
您可以使用正则表达式lookarounds
strsplit(string, sprintf('\\s+(?=%s)',
paste(fruits, collapse='|')), perl=TRUE)[[1]]
#[1] "This is a list of fruits and their properties."
#[2] "APPLE This is a red fruit, typically very SWEET!"
#[3] "BANANA This is a yellow fruit, also sweet!"
#[4] "ORANGE This is an orange fruit and also, yes, sweet"