R中字符串内某些动态变化之间的高级替换

时间:2018-08-10 17:25:41

标签: r regex string dplyr

我有刺痛

  

“ Functionname('parameter1blue','parameter2red','14246,14681','Simple','2018-07-26')”

应替换为

  

“ Functionname('parameter1blue','parameter2red','14681,XXXXXX','Simple','2018-07-26')”

我看过regex函数和其他字符串函数,它们很长/很难编辑任何更改,我需要最简单的方法来对字符串数组进行此操作。

注意:“ XXXXXX”位于最大数字的位置,但它会替换最小数字的值,同时保留升序。

这是一个入门示例代码,其中包含示例数据和所需数据。

#This is the avaliable data
olddata<-data.frame(sqlcode=c("Functionname('parameter1blue','parameter2red','14246,14681','Simple','37748','2018-07-26')",
                              "Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,14681','Simple','24548','2018-07-26')",
                              "Functionname('parameter1white','parameter2red','13587,42254','Complex','36848','2018-07-26')",
                              "Functionname('parameter1green','parameter2green','14246','Simple','37258','2018-07-26')",
                              "Functionname('parameter1red','parameter2white','14246,14681','Complex','37568','2018-07-26')",
                              "Functionname('parameter1blue','parameter2white','13587,42243','Simple','22548','2018-07-26')"),stringsAsFactors = F)

#This is the value which has to be replaced
newval="XXXXXX"

#This is how the new data should look like
#The numbers between the parameter2color and Simple/complex are supposed to be replaced with the newval in a way that the first number between them is replaced with it 
# but placed at the position of the last number
desireddata<-data.frame(sqlcode=c("Functionname('parameter1blue','parameter2red','14681,XXXXXX','Simple','37748','2018-07-26')",
                              "Functionname('parameter1green','parameter2blue','13559,13914,14246,14681,XXXXXX','Simple','24548','2018-07-26')",
                              "Functionname('parameter1white','parameter2red','42254,XXXXXX','Complex','36848','2018-07-26')",
                              "Functionname('parameter1green','parameter2green','XXXXXX','Simple','37258','2018-07-26')",
                              "Functionname('parameter1red','parameter2white','14681,XXXXXX','Complex','37568','2018-07-26')",
                              "Functionname('parameter1blue','parameter2white','42243,XXXXXX','Simple','22548','2018-07-26')"))

2 个答案:

答案 0 :(得分:3)

好的,新规则,新代码,新测试数据。我将解决方案保留在下面(gsub`regmatches<-`),但是它们似乎并不遵循规则。这是使用OP中数据的有效代码。

gr1 <- gregexpr("\\(.*\\)", olddata$sqlcode)
args <- strsplit(unlist(regmatches(olddata$sqlcode, gr1)), "','")
arg3 <- sapply(args, `[[`, 3)
arg3new <- sapply(strsplit(arg3, ","), function(a) paste(c(tail(a,n=-1), newval), collapse=","))
regmatches(olddata$sqlcode, gr1) <- sapply(mapply(`[<-`, args, list(3), arg3new, SIMPLIFY=FALSE), paste, collapse="','")
olddata
#                                                                                                           sqlcode
# 1                     Functionname('parameter1blue','parameter2red','14681,XXXXXX','Simple','37748','2018-07-26')
# 2 Functionname('parameter1green','parameter2blue','13559,13914,14246,14681,XXXXXX','Simple','24548','2018-07-26')
# 3                   Functionname('parameter1white','parameter2red','42254,XXXXXX','Complex','36848','2018-07-26')
# 4                        Functionname('parameter1green','parameter2green','XXXXXX','Simple','37258','2018-07-26')
# 5                   Functionname('parameter1red','parameter2white','14681,XXXXXX','Complex','37568','2018-07-26')
# 6                   Functionname('parameter1blue','parameter2white','42243,XXXXXX','Simple','22548','2018-07-26')

此行下方的所有内容都不再需要。



两种方法:第一种(gsub)会更改找到的最大数量的所有实例,这可能/不可能或存在问题;第二个(`regmatches<-`)仅替换which.max返回的最大值,因此它将始终替换最多一个数字。


1:gsub

gr <- gregexpr("[0-9]+", olddata$sqlcode)
str( nums <- regmatches(olddata$sqlcode, gr) )
# List of 6
#  $ : chr [1:7] "1" "2" "14246" "14681" ...
#  $ : chr [1:10] "1" "2" "13027" "13559" ...
#  $ : chr [1:7] "1" "2" "13587" "42254" ...
#  $ : chr [1:6] "1" "2" "14246" "2018" ...
#  $ : chr [1:7] "1" "2" "14246" "14681" ...
#  $ : chr [1:7] "1" "2" "13587" "42243" ...
str( inds <- sapply(nums, function(n) which.max(as.integer(n))) )
#  int [1:6] 4 7 4 3 4 4
str( replacethese <- mapply(`[[`, nums, inds) )
#  chr [1:6] "14681" "14681" "42254" "14246" "14681" "42243"

mapply(function(strings,old) gsub(paste0("\\b", old, "\\b"), newval, strings),
       olddata$sqlcode, replacethese)
# [1] "Functionname('parameter1blue','parameter2red','14246,XXXXXX','Simple','2018-07-26')"                    
# [2] "Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,XXXXXX','Simple','2018-07-26')"
# [3] "Functionname('parameter1white','parameter2red','13587,XXXXXX','Complex','2018-07-26')"                  
# [4] "Functionname('parameter1green','parameter2green','XXXXXX','Simple','2018-07-26')"                       
# [5] "Functionname('parameter1red','parameter2white','14246,XXXXXX','Complex','2018-07-26')"                  
# [6] "Functionname('parameter1blue','parameter2white','13587,XXXXXX','Simple','2018-07-26')"                  

2:`regmatches<-`

N.B。,此方法在side-effect中运行,方法是就地(在框架内)更改数据;如果这是一个问题,请改为处理数据副本。

从不变的数据开始,唯一的规定是字符串必须为character,而不是factor。 (如果将stringsAsFactors=FALSE添加到对data.frameread.tableread.csv等的呼叫中,则不会有问题。)

olddata$sqlcode <- as.character(olddata$sqlcode)

我们需要一个函数来索引gregexpr的返回值。这很简单,但是因为属性也需要索引,所以看起来有点吵:

index_reg <- function(gr, i) {
  newgr <- gr[i]
  attributes(newgr) <- attributes(gr)
  attr(newgr, "match.length") <- attr(newgr, "match.length")[i]
  newgr
}

有了这个,我们就可以做到:

gr <- gregexpr("[0-9]+", olddata$sqlcode)                  # no change
nums <- regmatches(olddata$sqlcode, gr)                    # no change
inds <- sapply(nums, function(n) which.max(as.integer(n))) # no change
regmatches(olddata$sqlcode, mapply(index_reg, gr, inds, SIMPLIFY=FALSE)) <- newval
olddata # changed in-place, SIDE-EFFECT!
#                                                                                                   sqlcode
# 1                     Functionname('parameter1blue','parameter2red','14246,XXXXXX','Simple','2018-07-26')
# 2 Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,XXXXXX','Simple','2018-07-26')
# 3                   Functionname('parameter1white','parameter2red','13587,XXXXXX','Complex','2018-07-26')
# 4                        Functionname('parameter1green','parameter2green','XXXXXX','Simple','2018-07-26')
# 5                   Functionname('parameter1red','parameter2white','14246,XXXXXX','Complex','2018-07-26')
# 6                   Functionname('parameter1blue','parameter2white','13587,XXXXXX','Simple','2018-07-26')

答案 1 :(得分:0)

这是一种stringr的方法,用于替换组中的最后一个数字:

olddata <- data.frame(
  sqlcode = c(
    "Functionname('parameter1blue','parameter2red','14246,14681','Simple','2018-07-26')",
    "Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,14681','Simple','2018-07-26')",
    "Functionname('parameter1white','parameter2red','13587,42254','Complex','2018-07-26')",
    "Functionname('parameter1green','parameter2green','14246','Simple','2018-07-26')",
    "Functionname('parameter1red','parameter2white','14246,14681','Complex','2018-07-26')",
    "Functionname('parameter1blue','parameter2white','13587,42243','Simple','2018-07-26')"
  )
)

library(tidyverse)
desireddata <- olddata %>%
  mutate(sqlcode = str_replace(sqlcode, "\\d{5}(?=','(Simple|Complex))", "XXXXXX"))
desireddata
#>                                                                                                   sqlcode
#> 1                     Functionname('parameter1blue','parameter2red','14246,XXXXXX','Simple','2018-07-26')
#> 2 Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,XXXXXX','Simple','2018-07-26')
#> 3                   Functionname('parameter1white','parameter2red','13587,XXXXXX','Complex','2018-07-26')
#> 4                        Functionname('parameter1green','parameter2green','XXXXXX','Simple','2018-07-26')
#> 5                   Functionname('parameter1red','parameter2white','14246,XXXXXX','Complex','2018-07-26')
#> 6                   Functionname('parameter1blue','parameter2white','13587,XXXXXX','Simple','2018-07-26')

reprex package(v0.2.0)于2018-08-10创建。