矢量化字符串替换显示奇怪的行为

时间:2017-07-31 09:42:23

标签: r regex string

我有一些数据帧匹配模式和替换字符串以进行替换。一个人的前几行看起来像这样:

> df
  pattern repl
1       1  111
2       2  112
3       3  113
4       5  114
5       6  115

我想替换给定向量中的字符串(我们将在此处调用str_vector)。假设,str_vector看起来像这样:

> str_vector
 [1] "1"  "2"  "3"  "4"  NA  "6"  "7"  "8"  "9"  "10"

我无法将str_vector中与df$pattern匹配的元素替换为相应的df$repl字符串。我在这个问题上阅读了很多主题,但到目前为止还没有任何工作。使用qdapstringrstringi会返回:

> qdap::mgsub(df$pattern,df$repl,str_vector)
 [1] "111"           "1111112"       "1111113"       "4"             NA             
 [6] "1111111111114" "7"             "8"             "9"             "1110"

> stringr::str_replace(df$pattern,df$repl,str_vector)
 [1] "1" "2" "3" "5" "6" "1" "2" "3" "5" "6"

> stringi::stri_replace_all_fixed(df$pattern,df$repl,str_vector,vectorize_all = TRUE)
 [1] "1" "2" "3" "5" "6" "1" "2" "3" "5" "6"

任何帮助将不胜感激。

致以最诚挚的问候,非常感谢!

dfstr_vector的再现:

df<-structure(list(pattern = c("1", "2", "3", "5", "6"), repl = c("111", 
"112", "113", "114", "115")), .Names = c("pattern", "repl"), row.names = c(NA, 
-5L), class = "data.frame")

str_vector<-c("1", "2", "3", "4", NA, "6", "7", "8", "9", "10")

1 个答案:

答案 0 :(得分:1)

这是一个选项

v1 <- unname(setNames(df$repl, df$pattern)[str_vector])
i1 <- which(!is.na(v1))
v1[i1[1]:i1[length(i1)]]
#[1] "111" "112" "113" NA    NA    "115"