用字符串中的单个数字替换数字范围

时间:2018-03-18 03:12:28

标签: r text replace tm tidytext

有没有办法用字符串中的单个数字替换数字范围?数字可以是n-n,最可能是1-15,也可能是4-10。

范围可以用a)表示 -

a <- "I would like to buy 1-3 cats"

或用词b)例如:to,bis,jusqu'à

b <- "I would like to buy 1 jusqu'à 3 cats"

结果应该是

"I would like to buy 1,2,3 cats"

我发现了这个:Replace range of numbers with certain number但是却无法在R中使用它。

3 个答案:

答案 0 :(得分:6)

gsubfn包中的

gsubfngsub类似,但不是用替换字符串替换匹配,而是允许用户指定一个函数(可能在公式表示法中这样做)。然后,它将匹配传递给正则表达式中的捕获组,即与正则表达式的带括号部分的匹配,作为单独的参数,并将整个匹配替换为函数的输出。因此,我们匹配"(\\d+)(-| to | bis | jusqu'à )(\\d+)",这导致三个捕获组,因此该函数有3个参数。在函数中,我们使用seq及其中的第一个和第三个。请注意seq可以使用字符参数并将它们解释为数字,因此我们不必将参数转换为数字。

因此我们得到了这个单行:

library(gsubfn)
s <- c(a, b) # test input strings

gsubfn("(\\d+)(-| to | bis | jusqu'à )(\\d+)", ~ paste(seq(..1, ..3), collapse = ","), s)

,并提供:

[1] "I would like to buy 1,2,3 cats" "I would like to buy 1,2,3 cats"

答案 1 :(得分:2)

事实上,这有点棘手,除非有人已经编写了一个这样做的包(我不知道)。

a <- "I would like to buy 1-3 cats"
pos <- unlist(gregexpr("\\d+\\D+", a))
a_split <- unlist(strsplit(a, ""))
replacement <- paste(seq.int(a_split[pos[1]], a_split[pos[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, a)
# [1] "I would like to buy 1,2,3 cats"

编辑:显示相同的解决方案适用于两个数字之间的任意非数字字符:

b <- "I would like to buy 1 jusqu'à 3 cats"
pos_b <- unlist(gregexpr("\\d+\\D+", b))
b_split <- unlist(strsplit(b, ""))
replacement <- paste(seq.int(b_split[pos_b[1]], b_split[pos_b[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, b)
# [1] "I would like to buy 1,2,3 cats"

如果您愿意,可以为非数字字符的运行添加任意要求。如果您需要帮助,请分享数字之间的单词或符号的限制!

答案 2 :(得分:2)

不是最有效的,但是......

s <- c("I would like to buy 1-3 cats",
       "I would like to buy 1 jusqu'à 3 cats",
       "foo 22-33",
       "quux 11-3 bar")

gre <- gregexpr("([0-9]+(-| to | bis | jusqu'à )[0-9]+)", s)
gre2 <- gregexpr('[0-9]+', regmatches(s, gre))

regmatches(s, gre) <- lapply(regmatches(regmatches(s, gre), gre2),
                             function(a) paste(do.call(seq, as.list(as.integer(a))), collapse = ","))
s
# [1] "I would like to buy 1,2,3 cats"          "I would like to buy 1,2,3 cats"         
# [3] "foo 22,23,24,25,26,27,28,29,30,31,32,33" "quux 11,10,9,8,7,6,5,4,3 bar"