我有刺痛
“ Functionname('parameter1blue','parameter2red','14246,14681','Simple','2018-07-26')”
应替换为
“ Functionname('parameter1blue','parameter2red','14681,XXXXXX','Simple','2018-07-26')”
我看过regex
函数和其他字符串函数,它们很长/很难编辑任何更改,我需要最简单的方法来对字符串数组进行此操作。
注意:“ XXXXXX”位于最大数字的位置,但它会替换最小数字的值,同时保留升序。
这是一个入门示例代码,其中包含示例数据和所需数据。
#This is the avaliable data
olddata<-data.frame(sqlcode=c("Functionname('parameter1blue','parameter2red','14246,14681','Simple','37748','2018-07-26')",
"Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,14681','Simple','24548','2018-07-26')",
"Functionname('parameter1white','parameter2red','13587,42254','Complex','36848','2018-07-26')",
"Functionname('parameter1green','parameter2green','14246','Simple','37258','2018-07-26')",
"Functionname('parameter1red','parameter2white','14246,14681','Complex','37568','2018-07-26')",
"Functionname('parameter1blue','parameter2white','13587,42243','Simple','22548','2018-07-26')"),stringsAsFactors = F)
#This is the value which has to be replaced
newval="XXXXXX"
#This is how the new data should look like
#The numbers between the parameter2color and Simple/complex are supposed to be replaced with the newval in a way that the first number between them is replaced with it
# but placed at the position of the last number
desireddata<-data.frame(sqlcode=c("Functionname('parameter1blue','parameter2red','14681,XXXXXX','Simple','37748','2018-07-26')",
"Functionname('parameter1green','parameter2blue','13559,13914,14246,14681,XXXXXX','Simple','24548','2018-07-26')",
"Functionname('parameter1white','parameter2red','42254,XXXXXX','Complex','36848','2018-07-26')",
"Functionname('parameter1green','parameter2green','XXXXXX','Simple','37258','2018-07-26')",
"Functionname('parameter1red','parameter2white','14681,XXXXXX','Complex','37568','2018-07-26')",
"Functionname('parameter1blue','parameter2white','42243,XXXXXX','Simple','22548','2018-07-26')"))
答案 0 :(得分:3)
好的,新规则,新代码,新测试数据。我将解决方案保留在下面(gsub
和`regmatches<-`
),但是它们似乎并不遵循规则。这是使用OP中数据的有效代码。
gr1 <- gregexpr("\\(.*\\)", olddata$sqlcode)
args <- strsplit(unlist(regmatches(olddata$sqlcode, gr1)), "','")
arg3 <- sapply(args, `[[`, 3)
arg3new <- sapply(strsplit(arg3, ","), function(a) paste(c(tail(a,n=-1), newval), collapse=","))
regmatches(olddata$sqlcode, gr1) <- sapply(mapply(`[<-`, args, list(3), arg3new, SIMPLIFY=FALSE), paste, collapse="','")
olddata
# sqlcode
# 1 Functionname('parameter1blue','parameter2red','14681,XXXXXX','Simple','37748','2018-07-26')
# 2 Functionname('parameter1green','parameter2blue','13559,13914,14246,14681,XXXXXX','Simple','24548','2018-07-26')
# 3 Functionname('parameter1white','parameter2red','42254,XXXXXX','Complex','36848','2018-07-26')
# 4 Functionname('parameter1green','parameter2green','XXXXXX','Simple','37258','2018-07-26')
# 5 Functionname('parameter1red','parameter2white','14681,XXXXXX','Complex','37568','2018-07-26')
# 6 Functionname('parameter1blue','parameter2white','42243,XXXXXX','Simple','22548','2018-07-26')
此行下方的所有内容都不再需要。
两种方法:第一种(gsub
)会更改找到的最大数量的所有实例,这可能/不可能或存在问题;第二个(`regmatches<-`
)仅替换which.max
返回的最大值,因此它将始终替换最多一个数字。
gsub
gr <- gregexpr("[0-9]+", olddata$sqlcode)
str( nums <- regmatches(olddata$sqlcode, gr) )
# List of 6
# $ : chr [1:7] "1" "2" "14246" "14681" ...
# $ : chr [1:10] "1" "2" "13027" "13559" ...
# $ : chr [1:7] "1" "2" "13587" "42254" ...
# $ : chr [1:6] "1" "2" "14246" "2018" ...
# $ : chr [1:7] "1" "2" "14246" "14681" ...
# $ : chr [1:7] "1" "2" "13587" "42243" ...
str( inds <- sapply(nums, function(n) which.max(as.integer(n))) )
# int [1:6] 4 7 4 3 4 4
str( replacethese <- mapply(`[[`, nums, inds) )
# chr [1:6] "14681" "14681" "42254" "14246" "14681" "42243"
mapply(function(strings,old) gsub(paste0("\\b", old, "\\b"), newval, strings),
olddata$sqlcode, replacethese)
# [1] "Functionname('parameter1blue','parameter2red','14246,XXXXXX','Simple','2018-07-26')"
# [2] "Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,XXXXXX','Simple','2018-07-26')"
# [3] "Functionname('parameter1white','parameter2red','13587,XXXXXX','Complex','2018-07-26')"
# [4] "Functionname('parameter1green','parameter2green','XXXXXX','Simple','2018-07-26')"
# [5] "Functionname('parameter1red','parameter2white','14246,XXXXXX','Complex','2018-07-26')"
# [6] "Functionname('parameter1blue','parameter2white','13587,XXXXXX','Simple','2018-07-26')"
`regmatches<-`
N.B。,此方法在side-effect中运行,方法是就地(在框架内)更改数据;如果这是一个问题,请改为处理数据副本。
从不变的数据开始,唯一的规定是字符串必须为character
,而不是factor
。 (如果将stringsAsFactors=FALSE
添加到对data.frame
,read.table
,read.csv
等的呼叫中,则不会有问题。)
olddata$sqlcode <- as.character(olddata$sqlcode)
我们需要一个函数来索引gregexpr
的返回值。这很简单,但是因为属性也需要索引,所以看起来有点吵:
index_reg <- function(gr, i) {
newgr <- gr[i]
attributes(newgr) <- attributes(gr)
attr(newgr, "match.length") <- attr(newgr, "match.length")[i]
newgr
}
有了这个,我们就可以做到:
gr <- gregexpr("[0-9]+", olddata$sqlcode) # no change
nums <- regmatches(olddata$sqlcode, gr) # no change
inds <- sapply(nums, function(n) which.max(as.integer(n))) # no change
regmatches(olddata$sqlcode, mapply(index_reg, gr, inds, SIMPLIFY=FALSE)) <- newval
olddata # changed in-place, SIDE-EFFECT!
# sqlcode
# 1 Functionname('parameter1blue','parameter2red','14246,XXXXXX','Simple','2018-07-26')
# 2 Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,XXXXXX','Simple','2018-07-26')
# 3 Functionname('parameter1white','parameter2red','13587,XXXXXX','Complex','2018-07-26')
# 4 Functionname('parameter1green','parameter2green','XXXXXX','Simple','2018-07-26')
# 5 Functionname('parameter1red','parameter2white','14246,XXXXXX','Complex','2018-07-26')
# 6 Functionname('parameter1blue','parameter2white','13587,XXXXXX','Simple','2018-07-26')
答案 1 :(得分:0)
这是一种stringr
的方法,用于替换组中的最后一个数字:
olddata <- data.frame(
sqlcode = c(
"Functionname('parameter1blue','parameter2red','14246,14681','Simple','2018-07-26')",
"Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,14681','Simple','2018-07-26')",
"Functionname('parameter1white','parameter2red','13587,42254','Complex','2018-07-26')",
"Functionname('parameter1green','parameter2green','14246','Simple','2018-07-26')",
"Functionname('parameter1red','parameter2white','14246,14681','Complex','2018-07-26')",
"Functionname('parameter1blue','parameter2white','13587,42243','Simple','2018-07-26')"
)
)
library(tidyverse)
desireddata <- olddata %>%
mutate(sqlcode = str_replace(sqlcode, "\\d{5}(?=','(Simple|Complex))", "XXXXXX"))
desireddata
#> sqlcode
#> 1 Functionname('parameter1blue','parameter2red','14246,XXXXXX','Simple','2018-07-26')
#> 2 Functionname('parameter1green','parameter2blue','13027,13559,13914,14246,XXXXXX','Simple','2018-07-26')
#> 3 Functionname('parameter1white','parameter2red','13587,XXXXXX','Complex','2018-07-26')
#> 4 Functionname('parameter1green','parameter2green','XXXXXX','Simple','2018-07-26')
#> 5 Functionname('parameter1red','parameter2white','14246,XXXXXX','Complex','2018-07-26')
#> 6 Functionname('parameter1blue','parameter2white','13587,XXXXXX','Simple','2018-07-26')
由reprex package(v0.2.0)于2018-08-10创建。