Question

样本数据

a<-c("hour","four","ruoh", "six", "high", "our")

我想找到所有包含o＆u＆h＆的字符串，它们都是4个字符，但是顺序无关紧要。

我想退回"hour","four","ruoh" 这是我的尝试

grepl("o+u+r", a) nchar(a)==4

Answer 1

要匹配包含字符 h ， o 和 u 的长度4 的字符串，请使用：

Wendy <- Carlos <- 1:6

Summary(x=Wendy, y=Carlos)

Results for the variables
    Wendy and Carlos

The total square sum is: 17.5

The error square sum is: 0

Warning message:
Error square sum is zero

grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)", c("hour","four","ruoh", "six", "high", "our"), perl = TRUE) [1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE：字符串的长度为4。
(?=^.{4}$)：(?=.*x)出现在字符串中的任何位置。

Answer 2

您可以使用strsplit和setdiff，我在示例数据中添加了一个额外的边沿大小写：

a<-c("hour","four","ruoh", "six", "high", "our","oouh")
a[nchar(a) == 4 &
  lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
# [1] "hour" "ruoh"

或grepl：

a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
# [1] "hour" "ruoh" "oouh"

sapply(c("o","u","h"), Negate(grepl), a)为您提供一个矩阵，该矩阵的单词不包含每个字母，然后rowSums的作用类似于行应用的any，因为它将被强制为逻辑。

Answer 3

将grepl与已编辑的方法一起使用（用r代替h）：

a<-c("hour","four","ruoh", "six", "high", "our")

a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]

返回：

[1] "hour" "four" "ruoh"

查找一定长度且包含特定字符的字符串

3 个答案: