正则表达式用于提取单词和字符之间的所有单词

时间:2016-08-07 04:58:04

标签: regex r

我知道使用R执行正则表达式的基本知识。但是在这里我有一个像:

这样的文件
  

** [2016-04-28 14:00:06,603] ,,,,, SERVICE_ID = 441,DEBUG,DBSEntryServlet,DBSEntryServlet:delegateToRequestManager :: SERVICE_ID = 541,SERVICE_ID = 9981

     

[2016-04-28 14:00:06,608] ,,,,,, DEBUG,DBSEntryServlet,10.91.39.143:60801 SERVICE_ID = 00234,SERVICE_ID = 11134,IMD = 6767 **

我想提取时间戳以及该行中的所有SERVICE_ID。

所以,我的预期输出是:

  

[2016-04-28 14:00:06,603] SERVICE_ID = 441 SERVICE_ID = 541 SERVICE_ID = 9981

     

[2016-04-28 14:00:06,608] SERVICE_ID = 00234 SERVICE_ID = 11134

我尝试的代码只提取了一个SERVICE_ID。

library(qdapRegex)

a <- readLines("C:\\MY_FOLDER\\vinita\\sample.txt")

testi <- rm_between(a,"SERVICE_ID",",",extract = T)

2 个答案:

答案 0 :(得分:0)

我们将2个或更多,替换为" "以获得&#39; str2&#39;然后使用正则表达式,我们匹配一个或多个空格(\\s+)跟随])后跟字符(.*)直到字符串结尾,将其替换为"",以便我们可以提取[2016-04..,03]部分。从&#39; str2&#39;中,我们提取子串&#34; SERVICE_ID =&#34;接着将数字(\\d+)加入list,将paste加在一起,最后paste加上&#39; str3&#39;。

library(stringr)
str2 <- gsub(",{2,}", " ", str1)
str3 <- sub("(?<=\\])\\s+.*", "", str2, perl = TRUE)
paste(str3, sapply(str_extract_all(str2, "SERVICE_ID=\\d+"), paste, collapse=" "))
#[1] "[2016-04-28 14:00:06,603] SERVICE_ID=441 SERVICE_ID=541 SERVICE_ID=9981"
#[2] "[2016-04-28 14:00:06,608] SERVICE_ID=00234 SERVICE_ID=11134" 

数据

 str1 <- c("[2016-04-28 14:00:06,603],,,,,SERVICE_ID=441,DEBUG,DBSEntryServlet,DBSEntryServlet: delegateToRequestManager:: SERVICE_ID=541,SERVICE_ID=9981",
"[2016-04-28 14:00:06,608],,,,,,DEBUG,DBSEntryServlet,10.91.39.143:60801 SERVICE_ID=00234,SERVICE_ID=11134,IMD=6767")

答案 1 :(得分:0)

str1 <- c("[2016-04-28 14:00:06,603],,,,,SERVICE_ID=441,DEBUG,DBSEntryServlet,DBSEntryServlet: delegateToRequestManager:: SERVICE_ID=541,SERVICE_ID=9981",
      "[2016-04-28 14:00:06,608],,,,,,DEBUG,DBSEntryServlet,10.91.39.143:60801   SERVICE_ID=00234,SERVICE_ID=11134,IMD=6767")
 str2 <- gsub(",{2,}", " ", str1)
 str4 <- sub("\\].*","",str2,perl = TRUE)
 str5 <- sub("\\[","",str4,perl = T)

 service_ids <- sapply(str_extract_all(str2,"SERVICE_ID=\\d+"), function(x){paste(x,collapse = " ")})
 net <- cbind(str5,service_ids)

输出:

enter image description here