正则表达式替换R中的某些文本

时间:2016-11-29 04:34:09

标签: r regex

我正在使用data.csv文件,我需要处理某些数据模式。目前,我的data.csv文件中的类colum看起来像:

org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java     
org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java    
org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java      
org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java 
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java    
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java 

现在,我需要替换括号前面出现的文字“(”with text“.java”。在这种情况下,我想要的输出是:

org.apache.camel.bam.TimeExpression.java     
org.apache.camel.bam.rules.TemporalRule.java     
org.apache.camel.bam.rules.ActivityRules.java    
org.apache.camel.bam.rules.ProcessRules.java
org.apache.camel.bam.processor.JpaBamProcessor.java      
org.apache.camel.bam.processor.JpaBamProcessor.java 

目前,我正在尝试使用以下代码:

dscls<-gsub("\\.[^.]+($", "java", data$class)

所以,基本上,我试图找到文本直到“(”然后用文本“.java”替换它。但是,它不会产生正确的输出。有人可以帮我正确理清正则表达式吗?

3 个答案:

答案 0 :(得分:1)

我们可以使用sub来匹配单词(\\w+),然后是(,后跟另一个单词(\\w+)和一个点(\\.) ,将其替换为空白("")。

sub("\\w+\\(\\w+\\.", "", data$class)
#[1] "org.apache.camel.bam.TimeExpression.java"  
#[2] "org.apache.camel.bam.rules.TemporalRule.java"
#[3] "org.apache.camel.bam.rules.ActivityRules.java"      
#[4] "org.apache.camel.bam.rules.ProcessRules.java"        
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java" 
#[6]"org.apache.camel.bam.processor.JpaBamProcessor.java"

数据

 data <- structure(list(class = 
 c("org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java", 
"org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java", 
"org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java", 
"org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java", 
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java", 
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java"
)), .Names = "class", row.names = c(NA, -6L), class = "data.frame")

答案 1 :(得分:1)

这里df $ x包含您共享的数据

gsub("\\w+\\(.*", "java", df$x)
[1] "org.apache.camel.bam.TimeExpression.java"           "org.apache.camel.bam.rules.TemporalRule.java"       
[3] "org.apache.camel.bam.rules.ActivityRules.java"       "org.apache.camel.bam.rules.ProcessRules.java"       
[5] "org.apache.camel.bam.processor.JpaBamProcessor.java" "org.apache.camel.bam.processor.JpaBamProcessor.java"

答案 2 :(得分:1)

由于你已经有以.java结尾的字符串(至少在例子中),你也可以试试这个:

strs <- c('org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java','org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java','org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java','org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java')

gsub('\\.\\w+\\(\\w+(\\.java)', '\\1', strs)

#[1] "org.apache.camel.bam.TimeExpression.java"           
#[2] "org.apache.camel.bam.rules.TemporalRule.java"       
#[3] "org.apache.camel.bam.rules.ActivityRules.java"      
#[4] "org.apache.camel.bam.rules.ProcessRules.java"       
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java"
#[6] "org.apache.camel.bam.processor.JpaBamProcessor.java"