使用正则表达式检测R中的数字重复

时间:2014-03-28 04:22:36

标签: regex r

此代码不应该用于在R?

中重复检测数字
> grep(pattern = "\\d{2}", x = 1223)
[1] 1
> grep(pattern = "\\d{3}", x = 1223)
[1] 1

如果我们有988,我们应该得到真实,如果123我们应该得到假。

听起来不是。

> grep(pattern = "\\d{2}", x = "1223")
[1] 1
> grep(pattern = "\\d{2}", x = "13")
[1] 1

2 个答案:

答案 0 :(得分:4)

您需要使用反向引用:

> grep(pattern = "(\\d)\\1", x = "1224")
[1] 1
> grep(pattern = "(\\d)\\1{1,}", x = "1224")
[1] 1
> grep(pattern = "(\\d)\\1", x = "1234")
integer(0)

编辑:好像你需要弄清楚它是如何工作的:(\\d)\\d创建一个捕获组,可以使用反向引用{{1}来引用它}}。例如,如果您有\\1之类的数字,并且想要找到x2yx相同的数字,那么:

y

我强烈建议您阅读regular expressions上的基本教程。

答案 1 :(得分:1)

我知道这个问题明确表示"使用正则表达式"在标题中,但这是一个非正则方法,可以工作,这取决于你想做什么。

strings <- c("1223","1233","1234","113")

# detect consecutive repeat digits, or characters
(strings.rle <- lapply(strings, function(x)rle(unlist(strsplit(x,"")))))

[[1]]
Run Length Encoding
  lengths: int [1:3] 1 2 1
  values : chr [1:3] "1" "2" "3"

[[2]]
Run Length Encoding
  lengths: int [1:3] 1 1 2
  values : chr [1:3] "1" "2" "3"

[[3]]
Run Length Encoding
  lengths: int [1:4] 1 1 1 1
  values : chr [1:4] "1" "2" "3" "4"

[[4]]
Run Length Encoding
  lengths: int [1:2] 2 1
  values : chr [1:2] "1" "3"

现在,您可以使用strings.rle来执行您想要的操作

# which entries have consecutive repeat digits, or characters
strings[sapply(strings.rle, function(x) any(x$lengths > 1))]
[1] "1223" "1233" "113"

# which digits or characters are consecutively repeated?
lapply(strings.rle, function(x) x$values[which(x$lengths > 1)])
[[1]]
[1] "2"

[[2]]
[1] "3"

[[3]]
character(0)

[[4]]
[1] "1"