样本数据
a<-c("hour","four","ruoh", "six", "high", "our")
我想找到所有包含o&u&h&的字符串,它们都是4个字符,但是顺序无关紧要。
我想退回"hour","four","ruoh"
这是我的尝试
grepl("o+u+r", a) nchar(a)==4
答案 0 :(得分:2)
要匹配包含字符 h , o 和 u 的长度4 的字符串,请使用:>
Wendy <- Carlos <- 1:6
Summary(x=Wendy, y=Carlos)
Results for the variables
Wendy and Carlos
The total square sum is: 17.5
The error square sum is: 0
Warning message:
Error square sum is zero
grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
c("hour","four","ruoh", "six", "high", "our"),
perl = TRUE)
[1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
:字符串的长度为4。(?=^.{4}$)
:(?=.*x)
出现在字符串中的任何位置。答案 1 :(得分:1)
您可以使用strsplit
和setdiff
,我在示例数据中添加了一个额外的边沿大小写:
a<-c("hour","four","ruoh", "six", "high", "our","oouh")
a[nchar(a) == 4 &
lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
# [1] "hour" "ruoh"
或grepl
:
a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
# [1] "hour" "ruoh" "oouh"
sapply(c("o","u","h"), Negate(grepl), a)
为您提供一个矩阵,该矩阵的单词不包含每个字母,然后rowSums
的作用类似于行应用的any
,因为它将被强制为逻辑。>
答案 2 :(得分:1)
将grepl与已编辑的方法一起使用(用r代替h):
a<-c("hour","four","ruoh", "six", "high", "our")
a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]
返回:
[1] "hour" "four" "ruoh"