我正在尝试搜索字符串以对数据帧进行子集化。我的df看起来像这样:
dput(df)
structure(list(Cause = structure(c(2L, 1L), .Label = c("jasper not able to read the property table after the release",
"More than 7000 messages loaded which stuck up"), class = "factor"),
Resolution = structure(1:2, .Label = c("jobs and reports are processed",
"Updated the property table which resolved the issue."), class = "factor")), .Names = c("Cause",
"Resolution"), class = "data.frame", row.names = c(NA, -2L))
我正在尝试这样做:
df1<-subset(df, grepl("*MQ*|*queue*|*Queue*", df$Cause))
在“原因”列中搜索MQ或队列或队列,使用匹配的记录对数据帧df进行子集化。它似乎没有工作,它捕获其他记录,MQ,队列或队列字符串不存在。
这是你怎么做的,我可以遵循的任何其他想法吗?
答案 0 :(得分:6)
下面的正则表达似乎有效。我在data.frame
添加了一行,这是一个更有趣的例子。
我认为问题来自你的正则表达式中的*
,还添加了大括号来定义|
的组,但不认为这是强制性的。
df <- data.frame(Cause=c("jasper not able to read the property table after the release",
"More than 7000 messages loaded which stuck up",
"blabla Queue blabla"),
Resolution = c("jobs and reports are processed",
"Updated the property table which resolved the issue.",
"hop"))
> head(df)
Cause Resolution
1 jasper not able to read the property table after the release jobs and reports are processed
2 More than 7000 messages loaded which stuck up Updated the property table which resolved the issue.
3 blabla Queue blabla hop
> subset(df, grepl("(MQ)|(queue)|(Queue)", df$Cause))
Cause Resolution
3 blabla Queue blabla hop
这是你想要的吗?
答案 1 :(得分:1)
从评论中转移:
subset(df, grepl("MQ|Queue|queue", Cause))
或者如果任何情况可以接受,那么:
subset(df, grepl("mq|queue", Cause, ignore.case = TRUE))
要获取更多信息,请在R。
中尝试?regex
和?grepl