下面是一个已解决的线程:matching strings regex exact match(感谢@Onyambu的更新代码)。
我需要完全匹配字符串-即使有特殊字符。
注意-抱歉,这是此问题上的第三个问题。我快到了,但是现在我不知道该如何处理特殊字符,而且我仍然在处理r中的字符串方面仍处于高水平。
更新了清晰度:
我有一个这样的匹配词/字符串表:
codes <- structure(
list(
column1 = structure(
c(2L, 3L, NA),
.Label = c("",
"4+", "4 +"),
class = "factor"
),
column2 = structure(
c(1L,
3L, 2L),
.Label = c("old", "the money", "work"),
class = "factor"
),
column3 = structure(
c(3L, 2L, NA),
.Label = c("", "wonderyears",
"woke"),
class = "factor"
)
),
row.names = c(NA,-3L),
class = "data.frame"
)
还有一个包含一列字符串的数据集。 我想查看字符串中的每个记录中是否包含任何代码:
strings<- structure(
list(
SurveyID = structure(
1:4,
.Label = c("ID_1", "ID_2",
"ID_3", "ID_4"),
class = "factor"
),
Open_comments = structure(
c(2L,
4L, 3L, 1L),
.Label = c(
"I need to pick up some apples",
"The system works",
"Flag only if there is a 4 with a plus",
"Show me the money"
),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-4L)
)
我当前正在使用以下代码将代码与字符串匹配:
strings[names(codes)] <- lapply(codes, function(x)
+(grepl(paste0("\\b", na.omit(x), "\\b", collapse = "|"), strings$Open_comments)))
输出:
SurveyID Open_comments column1 column2 column3
1 ID_1 The system works 0 0 0
2 ID_2 Show me the money 0 1 0
3 ID_3 Flag only if there is a 4 with a plus 1 0 0
4 ID_4 I need to pick up some apples 0 0 0
问题-第3行ID_3 我只想在字符串包含“ 4+”或“ 4 +”的情况下进行标记,但是无论如何都将其标记出来。 反正有确切的捕捉吗?
答案 0 :(得分:2)
我们可以对+
进行转义以对其进行字面评估
+(grepl(paste0( "(", gsub("\\+", "\\\\+", na.omit(codes$column1)), ")",
collapse="|"), strings$Open_comments))
#[1] 0 0 0 0
如果我们使用带有4+
的字符串,它将拾取
+(grepl(paste0( "(", gsub("\\+", "\\\\+", na.omit(codes$column1)), ")",
collapse="|"), "Flag only if there is a 4+ with a plus"))
#[1] 1
对于多列
sapply(codes, function(x)+(grepl(paste0( "\\b(",
gsub("\\+", "\\\\+", na.omit(x)), ")\\b",
collapse="|"), strings$Open_comments)))
# column1 column2 column3
#[1,] 0 0 0
#[2,] 0 1 0
#[3,] 0 0 0
#[4,] 0 0 0