我想使用R中的regex从句子中提取字符串。我是R的新手,所以不知道从哪里开始或怎么做?
string<-c(".\n Written by\nJ-S-Golden \n
\n \n \n Plot Summary\n |\n Plot
Synopsis\n \n \n Plot Keywords:\n wrongful
imprisonment\n |\n escape from prison\n
|\n based on the works of stephen king\n |\n
prison\n |\n voice over narration\n | See
All (296) » \n \n Taglines:\nFear can hold you
prisoner. Hope can set you free. \n \n")
我有字符串,我想要输出的是:
Plot Keywords:
\n wrongful imprisonment\n
|\n escape from prison\n
|\n based on the works of stephen king\n
|\n prison\n
|\n voice over narration\n
| See All (296) » \n \n
我不知道如何从字符串中提取干净的数据。有人可以帮我吗。
答案 0 :(得分:1)
这是使用基数R的sub
函数的解决方案。这匹配(包括)前导文本Plot Keywords:
。然后,它使用一个经过修饰的点来匹配任何字符,直到但不包括以下第一个标签和冒号。
sub("(?s).*(Plot Keywords:(?:(?![^: ]+:).)*).*", "\\1", string, perl=TRUE)
[1] "Plot Keywords:\n wrongful \nimprisonment\n
|\n escape from prison\n
\n|\n based on the works of
stephen king\n
|\n \nprison\n |\n voice over narration\n
| See \nAll (296) » \n \n "
在这种特殊情况下,纯正则表达式演示可能比R演示更有用,因此这里有一个链接: