我想首先在向量中使用一个字符串,然后将其替换为具有相同长度或1的匹配向量。我使用了具有multigsub函数的qdap包。虽然它只是取代了一切。期望输出的示例(以及具有循环的解决方案)。另外我不希望发现“Jabad”。
df1 <- data.frame(string = c("Erik is pretty good", "Fred is regular", "James is bad", "Jabad is extra"))
replacements <- c("good", "regular", "bad")
df1$status <- NA
for(i in 1:3){
df1[grepl(replacements[i], df1$string), "status"] <- replacements[i]
}
df1
第二个例子
df1$status <- "Status unknown"
for(i in 1:3){
df1[grepl(replacements[i], df1$string), "status"] <- "Status known"
}
df1
寻找类似于multigsub的东西,其中可以指定两个向量,例如c(“... Good ...”,“... Best ...”,“...... Regular ...”, “...额外”......)将被替换 c(“好”,“好”,“常规”,“最佳”)。在这种情况下,multigsub将返回单词之前/之后的文本(在本例中用...表示)。
答案 0 :(得分:1)
如果我了解你的情况,这就是你想要的。它使用库str_extract
中的stringr
函数。
我已经添加了一些案例来展示
变量s
将保留您要搜索的字符串,而r
将保留找到的值的替换值。
library(stringr)
df = structure(list(string = structure(c(1L, 2L, 5L, 3L, 4L, 6L), .Label = c("Erik is pretty good",
"Fred is regular", "Jabad is extra", "Jabad is unknown", "James is bad",
"John is best"), class = "factor")), .Names = "string", row.names = c(NA,
-6L), class = "data.frame")
s = c('good', 'best', 'regular', 'bad', 'extra')
r = c('Good', 'Good', 'Regular', 'Bad', 'Best')
names(r) <- s
pat = paste0("\\b(", paste0(s, collapse = "|"), ")\\b")
z = str_extract(df$string, pat)
# Lookup function will return NA when input is NA
lookup <- function(x, s, r){
i = match(x, s)
if(is.na(i)) return(NA)
r[[i]]
}
df$Status = sapply(z, lookup, s=s, r=r)
df = transform(df, Status2 = ifelse(is.na(Status), "Status Unknown", "Status Known"))
生成的data.frame是:
string Status Status2
1 Erik is pretty good Good Status Known
2 Fred is regular Regular Status Known
3 James is bad Bad Status Known
4 Jabad is extra Best Status Known
5 Jabad is unknown <NA> Status Unknown
6 John is best Good Status Known