perl正则表达式匹配n位数,但仅限于它们不完全相同

时间:2013-12-23 19:05:48

标签: regex perl

使用Perl正则表达式,我需要匹配一系列八位数字,例如12345678,但前提是它们并非完全相同。 00000000和99999999是不匹配的典型模式。我试图从现有的数据库记录中清除掉明显无效的值。

我有这个:

my ($match) = /(\d{8})/;

但是我不能完全正确安排背光。

4 个答案:

答案 0 :(得分:9)

怎么样:

^(\d)(?!\1{7})\d{7}$

这将匹配没有8个相同数字的8位数字。

示例代码:

my $re = qr/^(\d)(?!\1{7})\d{7}$/;
while(<DATA>) {
    chomp;
    say (/$re/ ? "OK : $_" : "KO : $_");
}

__DATA__
12345678
12345123
123456
11111111

<强>输出:

OK : 12345678
OK : 12345123
KO : 123456
KO : 11111111

<强>解释

The regular expression:

(?-imsx:^(\d)(?!\1{7})\d{7}$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d                       digits (0-9)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    \1{7}                    what was matched by capture \1 (7 times)
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  \d{7}                    digits (0-9) (7 times)
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

答案 1 :(得分:1)

我会在两个正则表达式中执行此操作。一个用于匹配您要查找的内容,另一个用于过滤您不想要的内容。

受HamZa答案的启发,我也提供了一个正则表达式解决方案。

use strict;
use warnings;

while (my $num = <DATA>) {
    chomp $num;

    # Single Regex Solution - Inspired by HamZa's code
    if ($num =~ /^.*(\d).*\1.*$(*SKIP)(*FAIL)|^\d{8}$/) {
        print "Yes - ";
    } else {
        print "No  - ";
    }

    # Two Regex Solution
    if ($num =~ /^\d{8}$/ && $num !~ /(\d).*\1/) {
        print "Yes - ";
    } else {
        print "No  - ";
    }

    print "$num\n";
}

__DATA__
12345678
12345674
00001111
00000000
99999999
87654321
87654351
123456789

结果呢?

Yes - Yes - 12345678
No  - No  - 12345674
No  - No  - 00001111
No  - No  - 00000000
No  - No  - 99999999
Yes - Yes - 87654321
No  - No  - 87654351
No  - No  - 123456789

答案 2 :(得分:0)

  

这个问题的答案基于匹配 n 数字,但前提是它们不是全部相同。< / p>


所以我来了以下表达式:

(\d)\1+\b(*SKIP)(*FAIL)|\d+

这是什么意思?

(\d)                # Match a digit and put it in group 1
\1+                 # Match what was matched in group 1 and repeat it one or more times
\b                  # Word boundary, we could use (?!\d) to be more specific
(*SKIP)(*FAIL)      # Skip & fail, we use this to exclude what we just have matched
|                   # Or
\d+                 # Match a digit one or more times

此正则表达式的优点是,每次要更改n时都不需要编辑它。当然,如果您只想匹配n个数字,则可以将最后一个替换\d+替换为\d{n}\b

Online demo

SKIP/FAIL reference

答案 3 :(得分:-2)

my $number = "99999999";                # look for first digit, capture,
print "ok\n" if $number =~ /(\d)\1{7}/; # use \1{7} to determine 7 matches of captured digit