匹配字符串regex完全匹配-特殊字符

时间:2020-09-10 02:28:45

标签: r regex string stringr stringi

下面是一个已解决的线程:matching strings regex exact match(感谢@Onyambu的更新代码)。

我需要完全匹配字符串-即使有特殊字符。

注意-抱歉,这是此问题上的第三个问题。我快到了,但是现在我不知道该如何处理特殊字符,而且我仍然在处理r中的字符串方面仍处于高水平。

更新了清晰度:

我有一个这样的匹配词/字符串表:

codes <- structure(
  list(
    column1 = structure(
      c(2L, 3L, NA),
      .Label = c("",
                 "4+", "4 +"),
      class = "factor"
    ),
    column2 = structure(
      c(1L,
        3L, 2L),
      .Label = c("old", "the money", "work"),
      class = "factor"
    ),
    column3 = structure(
      c(3L, 2L, NA),
      .Label = c("", "wonderyears",
                 "woke"),
      class = "factor"
    )
  ),
  row.names = c(NA,-3L),
  class = "data.frame"
)

还有一个包含一列字符串的数据集。 我想查看字符串中的每个记录中是否包含任何代码:

strings<- structure(
  list(
    SurveyID = structure(
      1:4,
      .Label = c("ID_1", "ID_2",
                 "ID_3", "ID_4"),
      class = "factor"
    ),
    Open_comments = structure(
      c(2L,
        4L, 3L, 1L),
      .Label = c(
        "I need to pick up some apples",
        "The system works",
        "Flag only if there is a 4 with a plus",
        "Show me the money"
      ),
      class = "factor"
    )
  ),
  class = "data.frame",
  row.names = c(NA,-4L)
)

我当前正在使用以下代码将代码与字符串匹配:

strings[names(codes)] <- lapply(codes, function(x) 
  +(grepl(paste0("\\b", na.omit(x), "\\b", collapse = "|"), strings$Open_comments)))

输出:

  SurveyID                         Open_comments column1 column2 column3
1     ID_1                      The system works       0       0       0
2     ID_2                     Show me the money       0       1       0
3     ID_3 Flag only if there is a 4 with a plus       1       0       0
4     ID_4         I need to pick up some apples       0       0       0

问题-第3行ID_3 我只想在字符串包含“ 4+”或“ 4 +”的情况下进行标记,但是无论如何都将其标记出来。 反正有确切的捕捉吗?

1 个答案:

答案 0 :(得分:2)

我们可以对+进行转义以对其进行字面评估

+(grepl(paste0( "(", gsub("\\+", "\\\\+", na.omit(codes$column1)), ")",
     collapse="|"), strings$Open_comments))
#[1] 0 0 0 0

如果我们使用带有4+的字符串,它将拾取

+(grepl(paste0( "(", gsub("\\+", "\\\\+", na.omit(codes$column1)), ")",
     collapse="|"), "Flag only if there is a 4+ with a plus"))
#[1] 1

对于多列

sapply(codes, function(x)+(grepl(paste0( "\\b(", 
      gsub("\\+", "\\\\+", na.omit(x)), ")\\b",
      collapse="|"), strings$Open_comments)))
#     column1 column2 column3
#[1,]       0       0       0
#[2,]       0       1       0
#[3,]       0       0       0
#[4,]       0       0       0