Question

我需要帮助来基于正则表达式创建变量。

这是我的数据框：

df <- data.frame(a=c("blue", "red", "yellow", "yellow", "yellow", "yellow", "red"), b=c("apple", "orange", "peach", "lemon", "pineapple", "tomato", NA))

基本上，我想要做的是这一步，但是只需一步：

regx_1 <- as.numeric(grep("^[a-z]{5}$", df$b))
regx_2 <- as.numeric(grep("^[a-z]{6,}$", df$b))
df$fruit_1 <- NA
df$fruit_1[regx_1 + 1] <- as.character(df$b[regx_1])

df$fruit_2 <- NA
df$fruit_2[regx_2 + 1] <- as.character(df$b[regx_2])

这是我的尝试：

regex1 <- "^[a-z]{5}$"
regex2 <- "^[a-z]{6,}$"
regex <- c(regex1, regex1)

make_non_matches_NA <- function(vec, pattern){
  df[[newvariable]] <- NA
  df[[newvariable]][as.numeric(grep(pattern, vec)) + 1] <- as.character(vec[as.numeric(grep(pattern, vec))])
  return(newvariable)
}

df[c("fruit1", "fruit2")] <- lapply(regex, make_non_matches_NA, vec = df$b)

编辑：为什么我的方法不对？（请注意，实际问题更大，因此我必须坚持一种避免重复模式的方法）

非常感谢您的帮助！

Answer 1

工作区中已编号的项目表明它们确实属于到列表，因此它们已正式链接起来，我们可以更轻松地与它们合作。因此，让我们先这样做。

NA

我们的核心功能是复制源向量，但删除不匹配的元素，并将make_non_matches_NA <- function(vec, pattern){ # logical indices of matches matches_lgl <- grepl(pattern, vec) # the elements which don't match should be NA vec[!matches_lgl] <- NA # resulting vector should be returned vec }保留在它们的位置，因此我们将为此创建一个函数，并为其明确命名。因此，我们可以直观地了解它在做什么（以及我们的下一个SO同事；））：

make_non_matches_NA(df$b, regex[[1]])
#> [1] apple <NA>  peach lemon <NA>  <NA> 
#> Levels: apple lemon orange peach pineapple tomato

让我们用第一个模式进行测试

lapply()

到目前为止一切顺利！现在让我们使用所有正则表达式进行测试，因为我们拥有更清晰的工具，例如lapply(regex, make_non_matches_NA, vec = df$b) #> [[1]] #> [1] apple <NA> peach lemon <NA> <NA> #> Levels: apple lemon orange peach pineapple tomato #> #> [[2]] #> [1] <NA> orange <NA> <NA> pineapple tomato #> Levels: apple lemon orange peach pineapple tomato，所以我们通常可以在R中避免使用for循环。在这里，我想将此功能应用到所有正则表达式：

df[c("fruit1", "fruit2")] <- lapply(regex, make_non_matches_NA, vec = df$b)
# then print my updated df
df
#>   a         b fruit1    fruit2
#> 1 1     apple  apple      <NA>
#> 2 2    orange   <NA>    orange
#> 3 3     peach  peach      <NA>
#> 4 4     lemon  lemon      <NA>
#> 5 5 pineapple   <NA> pineapple
#> 6 6    tomato   <NA>    tomato

太好了，它有效了！

但是我希望在data.frame中而不是将其作为单独的列表，因此我会将结果直接分配给df中的相关名称

{{1}}

多田！

Answer 2

我不认为这是“一步一步”，但是您可以尝试从dplyr包中尝试df <- data.frame(a=c(1:6), b=c("apple", "orange", "peach", "lemon", "pineapple", "tomato"), stringsAsFactors = FALSE)：

stringsAsFactors = FALSE

请注意，我在data.frames中设置了dplyr::mutate(df, fruit_1 = if_else(grepl("^[a-z]{5}$", b), b, NA_character_), fruit_2 = if_else(grepl("^[a-z]{6}$", b), b, NA_character_)) a b fruit_1 fruit_2 1 1 apple apple <NA> 2 2 orange <NA> orange 3 3 peach peach <NA> 4 4 lemon lemon <NA> 5 5 pineapple <NA> <NA> 6 6 tomato <NA> tomato。

 schema.virtual(''name').get(function() => {
 return this.anything;
 })

使用R中的循环基于正则表达式创建变量

2 个答案: