R正则表达式重复忽略上限

时间:2014-04-16 17:51:40

标签: regex r grepl

我尝试制作正则表达式,这有助于我过滤像

这样的字符串
blah_blah_suffix

其中suffix是长度为2到5个字符的任何字符串。所以我想要接受字符串

blah_blah_aa
blah_blah_abcd

但丢弃

blah_blah_a
blah_aaa
blah_blah_aaaaaaa

我以下列方式使用grepl:

samples[grepl("blah_blah_.{2,5}", samples)]

但它忽略了重复的上限(5)。所以它丢弃了字符串blah_blah_a,     blah_aaa,但接受字符串blah_blah_aaaaaaa。

我知道有一种方法可以在不使用正则表达式的情况下过滤字符串,但我想了解如何正确使用grepl。

2 个答案:

答案 0 :(得分:2)

您需要将表达式绑定到行的开头和结尾:

^blah_blah_.{2,5}$

^匹配行首和$匹配行尾。请在此处查看工作示例:Regex101

如果要将表达式绑定到字符串的开头和结尾(而不是多行),请使用\A\Z代替^$

Anchors Tutorial

答案 1 :(得分:1)

/^[\w]+_[\w]+_[\w]{2,5}$/

DEMO

Options: dot matches newline; case insensitive; ^ and $ match at line breaks

Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]{2,5}»
   Between 2 and 5 times, as many times as possible, giving back as needed (greedy) «{2,5}»
Assert position at the end of a line (at the end of the string or before a line break character) «$»