我有一个很长的向量。每个元素都是一个字符串。 每个字符串都可以拆分为由','。
分隔的子字符串我想检查一下我的矢量中的每个字符串是否包含至少一个' bad'串。如果是,那么整个SUBstring包含那个' bad' string应替换为新字符串。我用循环编写了一个很长的函数。但我可以发誓必须有一种更简单的方法 - 也许是使用stringr? 非常感谢你的建议!
# Create an example data frame:
test <- data.frame(a = c("str1_element_1_aaa, str1_element_2",
"str2_element_1",
"str3_element_1, str3_element_2_aaa, str3_element_3"),
stringsAsFactors = F)
test
str(test)
# Defining my long function that checks if each string in a
# vector contains a substring with a "bad" string in it.
# If it does, that whole substring is replaced with a new string:
library(stringr)
mystring_replace = function(strings_vector, badstring, newstring){
with_string <- grepl(badstring, strings_vector) # what elements contain badstring?
mysplits <- str_split(string = test$a[with_string], pattern = ', ') # split those elements with badstring based on ', '
for (i in 1:length(mysplits)) { # loop through the list of splits:
allstrings <- mysplits[[i]]
for (ii in 1:length(allstrings)) { # loop through substrings
if (grepl(badstring, allstrings[ii])) mysplits[[i]][ii] <- newstring
}
}
for (i in seq_along(mysplits)) { # merge the split elements back together
mysplits[[i]] <- paste(mysplits[[i]], collapse = ", ")
}
strings_vector[with_string] <- unlist(mysplits)
return(strings_vector)
}
# Test
mystring_replace(test$a, badstring = '_aaa', newstring = "NEW")
答案 0 :(得分:1)
认为这可能会这样做吗?
new_str_replace <- function(strings_vector, badstring, newstring){
split.dat <- strsplit(strings_vector,', ')[[1]]
split.dat[grepl(badstring, split.dat)] <- newstring
return(paste(split.dat, collapse = ', '))
}
results <- unname(sapply(test$a, new_str_replace, badstring = '_aaa', newstring = 'NEW'))
results
#[1] "NEW, str1_element_2" "str2_element_1"
#[3] "str3_element_1, NEW, str3_element_3"
答案 1 :(得分:1)
我是以分而治之的方式做到的。首先,我编写了一个函数,仅对一个字符串执行操作,然后对其进行矢量化。
# does the operation for a string only. divide-and-conquer
replace_one = function(string, badstring, newstring) {
# split it at ", "
strs = str_split(string, ", ")[[1]]
# an ifelse to find the ones containing badstring and replacing them
strs = ifelse(grepl(badstring, strs, fixed = TRUE), newstring, strs)
# join them again
paste0(strs, collapse = ", ")
}
# vectorizes it
my_replace = Vectorize(replace_one, "string", USE.NAMES = FALSE)
答案 2 :(得分:1)
以下是使用tidyverse
,purrr
和stringr
的方法:
library(tidyverse)
library(stringr)
# Small utility function
find_and_replace <- function(string, bad_string, replacement_string) {
ifelse(str_detect(string, bad_string), replacement_string, string)
}
str_split(test$a, ", ") %>%
map(find_and_replace, "aaa", "NEW") %>%
map_chr(paste, collapse = ", ") %>%
unlist
基本上:将矢量拆分为一个列表,在该列表上映射find_and_replace
,然后折叠结果。我建议在每个管道%>%
之后单独查看结果。