查找特定字符串并将该字符串添加到列

时间:2017-04-27 08:11:52

标签: r string gsub grepl

我想首先在向量中使用一个字符串,然后将其替换为具有相同长度或1的匹配向量。我使用了具有multigsub函数的qdap包。虽然它只是取代了一切。期望输出的示例(以及具有循环的解决方案)。另外我不希望发现“Jabad”。

df1 <- data.frame(string = c("Erik is pretty good", "Fred is regular", "James is bad", "Jabad is extra"))

replacements <- c("good", "regular", "bad")

df1$status <- NA

for(i in 1:3){

  df1[grepl(replacements[i], df1$string), "status"] <- replacements[i]

}

df1

第二个例子

df1$status <- "Status unknown"

for(i in 1:3){

  df1[grepl(replacements[i], df1$string), "status"] <- "Status known"


}

df1

寻找类似于multigsub的东西,其中可以指定两个向量,例如c(“... Good ...”,“... Best ...”,“...... Regular ...”, “...额外”......)将被替换 c(“好”,“好”,“常规”,“最佳”)。在这种情况下,multigsub将返回单词之前/之后的文本(在本例中用...表示)。

1 个答案:

答案 0 :(得分:1)

如果我了解你的情况,这就是你想要的。它使用库str_extract中的stringr函数。

我已经添加了一些案例来展示

变量s将保留您要搜索的字符串,而r将保留找到的值的替换值。

library(stringr)

df = structure(list(string = structure(c(1L, 2L, 5L, 3L, 4L, 6L), .Label = c("Erik is pretty good",
"Fred is regular", "Jabad is extra", "Jabad is unknown", "James is bad",
"John is best"), class = "factor")), .Names = "string", row.names = c(NA,
-6L), class = "data.frame")

s = c('good', 'best', 'regular', 'bad', 'extra')
r = c('Good', 'Good', 'Regular', 'Bad', 'Best')
names(r) <- s

pat = paste0("\\b(", paste0(s, collapse = "|"), ")\\b")

z = str_extract(df$string, pat)

# Lookup function will return NA when input is NA 
lookup <- function(x, s, r){
    i = match(x, s)
    if(is.na(i)) return(NA)
    r[[i]]
}

df$Status = sapply(z, lookup, s=s, r=r)

df = transform(df, Status2 = ifelse(is.na(Status), "Status Unknown", "Status Known"))

生成的data.frame是:

               string  Status        Status2
1 Erik is pretty good    Good   Status Known
2     Fred is regular Regular   Status Known
3        James is bad     Bad   Status Known
4      Jabad is extra    Best   Status Known
5    Jabad is unknown    <NA> Status Unknown
6        John is best    Good   Status Known