使用R从文本中提取子字符串

时间:2020-05-31 08:43:10

标签: r regex gsub

我有一个字符串数据,如下所示:

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"

为此,我必须使用以下代码提取字符串“ Social Media Learning and behaviour”:

gsub("        Uploaded on .* ", "", gsub("\n    Update Your Profile to Dissolve This Message\n", "",a)) 

这给了我如下输出

"Social Media Learning and behaviour\n\n"

我无法匹配确切的模式。没有“ \ n \ n”的情况下提取“社交媒体学习和行为”的确切模式是什么?

2 个答案:

答案 0 :(得分:1)

您可以捕获组中的上一行,并匹配包含Uploaded的下一行:

(.*)\r?\n[^\S\r\n]+Uploaded on

Regex demo

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"
stringr::str_match(a, "(.*)\\r?\\n[^\\S\\r\\n]+Uploaded on")

答案 1 :(得分:0)

您可以提取SELECT Employee_conferenceDay From Employee Where ID_Employee IN ( Select ID_Employee From Conference Where ID_conference = 206247 ); "Update Your Profile to Dissolve This Message"之间的部分

"Uploaded on"

您还可以使用sub(".*Update Your Profile to Dissolve This Message\n(.*)\n\\s+Uploaded on.*", "\\1", a) #[1] "Social Media Learning and behaviour" 中的str_match

stringr