用于保留案例模式,大写的正则表达式

时间:2014-10-02 23:44:10

标签: regex r

是否有正则表达式用于保留\U\L

中的案例模式?

在下面的示例中,我想将"date"转换为"month",同时保持input中使用的大小写

   from        to
  "date" ~~> "month"
  "Date" ~~> "Month"
  "DATE" ~~> "MONTH"

我目前使用三个嵌套调用sub来完成此操作。

input <- c("date", "Date", "DATE")
expected.out <- c("month", "Month", "MONTH")

sub("date", "month", 
  sub("Date", "Month", 
    sub("DATE", "MONTH", input)
  )
)

目标是拥有一个pattern和一个replace,例如

gsub("(date)", "\\Umonth", input, perl=TRUE) 

将产生所需的输出

3 个答案:

答案 0 :(得分:7)

当我认为for循环合理时,这是其中一种情况:

input <- rep("Here are a date, a Date, and a DATE",2)
pat <- c("date", "Date", "DATE")
ret <- c("month", "Month", "MONTH")

for(i in seq_along(pat)) { input <- gsub(pat[i],ret[i],input) }
input
#[1] "Here are a month, a Month, and a MONTH" 
#[2] "Here are a month, a Month, and a MONTH"

另一种礼貌的@flodel实现与循环Reduce相同的逻辑:

Reduce(function(str, args) gsub(args[1], args[2], str), 
       Map(c, pat, ret), init = input)

有关这些选项的基准测试,请参阅@ TylerRinker的答案。

答案 1 :(得分:5)

使用gsubfn包,可以避免使用嵌套的子功能,并在一次调用中执行此操作。

> library(gsubfn)
> x <- 'Here we have a date, a different Date, and a DATE'
> gsubfn('date', list('date'='month','Date'='Month','DATE'='MONTH'), x, ignore.case=T)
# [1] "Here we have a month, a different Month, and a MONTH"

答案 2 :(得分:4)

这是一个qdap方法。非常直接但不是最快的:

input <- rep("Here are a date, a Date, and a DATE",2)
pat <- c("date", "Date", "DATE")
ret <- c("month", "Month", "MONTH")


library(qdap)
mgsub(pat, ret, input)

## [1] "Here are a month, a Month, and a MONTH"
## [2] "Here are a month, a Month, and a MONTH"

基准:

input <- rep("Here are a date, a Date, and a DATE",1000)

library(microbenchmark)

(op <- microbenchmark( 
    GSUBFN = gsubfn('date', list('date'='month','Date'='Month','DATE'='MONTH'), 
             input, ignore.case=T),
    QDAP = mgsub(pat, ret, input),
    REDUCE = Reduce(function(str, args) gsub(args[1], args[2], str), 
       Map(c, pat, ret), init = input),
    FOR = function() {
       for(i in seq_along(pat)) { 
          input <- gsub(pat[i],ret[i],input) 
       }
       input
    },

times=100L))

## Unit: milliseconds
##    expr        min         lq     median         uq        max neval
##  GSUBFN 682.549812 815.908385 847.361883 925.385557 1186.66743   100
##    QDAP  10.499195  12.217805  13.059149  13.912157   25.77868   100
##  REDUCE   4.267602   5.184986   5.482151   5.679251   28.57819   100
##     FOR   4.244743   5.148132   5.434801   5.870518   10.28833   100

enter image description here