我有一个字符串向量,我想用不同的子字符串替换所有字符串中的一个公共子字符串。我在R中这样做。例如:
input=c("I like fruits","I like you","I like dudes")
# I need to do something like this
newStrings=c("You","We","She")
gsub("I",newStrings,input)
以便输出看起来像:
"You like fruits"
"We like you"
"She like dudes"
但是,gsub只使用newStrings中的第一个字符串。有什么建议? 感谢
答案 0 :(得分:14)
您可以使用stringr
:
stringr::str_replace_all(input, "I" ,newStrings)
[1] "You like fruits" "We like you"
[3] "She like dudes"
或@ David Arenburg的建议:
stringi::stri_replace_all_fixed(input, "I", newStrings)
<强> BENCHMRK 强>
library(stringi)
library(stringr)
library(microbenchmark)
set.seed(123)
x <- stri_rand_strings(1e3, 10)
y <- stri_rand_strings(1e3, 1)
identical(stringi::stri_replace_all_fixed(x, "I", y), stringr::str_replace_all(x, fixed("I") , y))
# [1] TRUE
identical(stringi::stri_replace_all_fixed(x, "I", y), diag(sapply(y, gsub, pattern = "I", x = x, fixed = TRUE)))
# [1] TRUE
identical(stringi::stri_replace_all_fixed(x, "I", y), mapply(gsub, "I", y, x, USE.NAMES = FALSE, fixed = TRUE))
# [1] TRUE
microbenchmark("stingi: " = stringi::stri_replace_all_fixed(x, "I", y),
"stringr (optimized): " = stringr::str_replace_all(x, fixed("I") , y),
"base::mapply (optimized): " = mapply(gsub, "I", y, x, USE.NAMES = FALSE, fixed = TRUE),
"base::sapply (optimized): " = diag(sapply(y, gsub, pattern = "I", x = x, fixed = TRUE)))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# stingi: 132.156 137.1165 171.5822 150.3960 194.2345 460.145 100 a
# stringr (optimized): 801.894 828.7730 947.1813 912.6095 968.7680 2716.708 100 a
# base::mapply (optimized): 2827.104 2946.9400 3211.9614 3031.7375 3123.8940 8216.360 100 a
# base::sapply (optimized): 402349.424 476545.9245 491665.8576 483410.3290 513184.3490 549489.667 100 b
答案 1 :(得分:7)
mapply()
非常有用:
mapply(sub, "I", newStrings, input, USE.NAMES = FALSE,fixed=T)
# [1] "You like fruits" "We like you" "She like dudes"
答案 2 :(得分:2)
您可以将sapply
用于此
diag(sapply(newStrings,gsub,pattern="I",x=input))