我正在使用data.csv文件,我需要处理某些数据模式。目前,我的data.csv文件中的类colum看起来像:
org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java
org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java
org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java
org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java
org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java
现在,我需要替换括号前面出现的文字“(”with text“.java”。在这种情况下,我想要的输出是:
org.apache.camel.bam.TimeExpression.java
org.apache.camel.bam.rules.TemporalRule.java
org.apache.camel.bam.rules.ActivityRules.java
org.apache.camel.bam.rules.ProcessRules.java
org.apache.camel.bam.processor.JpaBamProcessor.java
org.apache.camel.bam.processor.JpaBamProcessor.java
目前,我正在尝试使用以下代码:
dscls<-gsub("\\.[^.]+($", "java", data$class)
所以,基本上,我试图找到文本直到“(”然后用文本“.java”替换它。但是,它不会产生正确的输出。有人可以帮我正确理清正则表达式吗?
答案 0 :(得分:1)
我们可以使用sub
来匹配单词(\\w+
),然后是(
,后跟另一个单词(\\w+
)和一个点(\\.
) ,将其替换为空白(""
)。
sub("\\w+\\(\\w+\\.", "", data$class)
#[1] "org.apache.camel.bam.TimeExpression.java"
#[2] "org.apache.camel.bam.rules.TemporalRule.java"
#[3] "org.apache.camel.bam.rules.ActivityRules.java"
#[4] "org.apache.camel.bam.rules.ProcessRules.java"
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java"
#[6]"org.apache.camel.bam.processor.JpaBamProcessor.java"
data <- structure(list(class =
c("org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java",
"org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java",
"org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java",
"org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java",
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java",
"org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java"
)), .Names = "class", row.names = c(NA, -6L), class = "data.frame")
答案 1 :(得分:1)
这里df $ x包含您共享的数据
gsub("\\w+\\(.*", "java", df$x)
[1] "org.apache.camel.bam.TimeExpression.java" "org.apache.camel.bam.rules.TemporalRule.java"
[3] "org.apache.camel.bam.rules.ActivityRules.java" "org.apache.camel.bam.rules.ProcessRules.java"
[5] "org.apache.camel.bam.processor.JpaBamProcessor.java" "org.apache.camel.bam.processor.JpaBamProcessor.java"
答案 2 :(得分:1)
由于你已经有以.java结尾的字符串(至少在例子中),你也可以试试这个:
strs <- c('org.apache.camel.bam.TimeExpression.evaluate(TimeExpression.java','org.apache.camel.bam.rules.TemporalRule.processExchange(TemporalRule.java','org.apache.camel.bam.rules.ActivityRules.processExchange(ActivityRules.java','org.apache.camel.bam.rules.ProcessRules.processExchange(ProcessRules.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java','org.apache.camel.bam.processor.JpaBamProcessor.processEntity(JpaBamProcessor.java')
gsub('\\.\\w+\\(\\w+(\\.java)', '\\1', strs)
#[1] "org.apache.camel.bam.TimeExpression.java"
#[2] "org.apache.camel.bam.rules.TemporalRule.java"
#[3] "org.apache.camel.bam.rules.ActivityRules.java"
#[4] "org.apache.camel.bam.rules.ProcessRules.java"
#[5] "org.apache.camel.bam.processor.JpaBamProcessor.java"
#[6] "org.apache.camel.bam.processor.JpaBamProcessor.java"