验证和分组excel公式格式

时间:2013-02-20 21:36:16

标签: regex

我正在尝试使用以下正则表达式验证excel公式样式:

=SUM\(((?:\w+\d+)(?::\w+\d+)?)((?:,\w+\d+)(?::\w+\d+)?)*\)

关于这个来源:

应该通过

=SUM(A1,A11:A212,A12:A56,A342:A12,A3)
=SUM(A11:A12,A12:a12,A34:A3)
=SUM(A1,A2,A3)
=SUM(A1)

应该失败

=SUM(A11:A212:A2,A12:A56,A4,A342:A12)

我的验证部分正常工作,但我无法弄清楚如何对每个逗号的值进行分组。他们应该是:

我希望如何对它们进行分组:

=SUM(A1,A11:A12,A12:A56,A3)     // Groups: $1 = A1 $2 = A11:A12 $3 = A12:A56 $4 = A3
=SUM(A11:A12,A10:A12,A34:A3)    // Groups: $1 = A11:A12 $2 = A10:A12 $3 = A34:A3
=SUM(A1,A2,A3)                  //Groups: $1 = A1 $2 = A2 $3 = A3
=SUM(A1)                        //Groups: $1 = A1

目前如何分组:

=SUM(A1,A11:A12,A12:A56,A3)     // Groups: $1 = A1 $2 = A3
=SUM(A11:A12,A10:A12,A34:A3)    // Groups: $1 = A11:A12 $2 = A34:A3
=SUM(A1,A2,A3)                  //Groups: $1 = A1 $2 = A3
=SUM(A1)                        //Groups: $1 = A1

注意,它将第一个和最后一个分组。我对REGEX很新,所以如果我在这里做些糟糕的事情,请指出我正确的方向。谢谢!

2 个答案:

答案 0 :(得分:1)

这是不可能的:(...)(?:,(...))+(2组)总是会产生2场比赛,无论+匹配多少。

您需要(至少)执行以下两个步骤:

value       :=  /\w+\d+(?::\w+\d+)?/

value_list  :=  /value(?:,value)*/

expression  :=  /=SUM\((value_list)\)/

现在匹配expression中的第1组(value_list),并在此匹配中找到所有value次出现。

PHP快速演示:

$text = 'should pass

=SUM(A1,A11:A212,A12:A56,A342:A12,A3)
=SUM(A11:A12,A12:a12,A34:A3)
=SUM(A1,A2,A3)
=SUM(A1)

should fail

=SUM(A11:A212:A2,A12:A56,A4,A342:A12)';

$value      = "\w+\d+(?::\w+\d+)?";
$value_list = "$value(?:,$value)*";
$expression = "=SUM\(($value_list)\)";

preg_match_all("/$expression/", $text, $matches);

// iterate over $value_list from $expression (group 1)
foreach($matches[1] as $group1) {
  preg_match_all("/$value/", $group1, $m);
  print_r($m);
}

打印:

Array
(
    [0] => Array
        (
            [0] => A1
            [1] => A11:A212
            [2] => A12:A56
            [3] => A342:A12
            [4] => A3
        )

)
Array
(
    [0] => Array
        (
            [0] => A11:A12
            [1] => A12:a12
            [2] => A34:A3
        )

)
Array
(
    [0] => Array
        (
            [0] => A1
            [1] => A2
            [2] => A3
        )

)
Array
(
    [0] => Array
        (
            [0] => A1
        )

)

答案 1 :(得分:0)

我实际上会先拆分字符串。类似的东西:

sub IsFormulaValid
{
    my $str = $_[0];
    (my $match) = $str =~ /^=SUM\(([^)]+)\)$/;
    my @sumArgs = split(/,\s*/, $match);
    my $valid = 1;
    foreach(@sumArgs){
        if($_ !~ /^[a-z]+\d+(?::[a-z]+\d+){0,1}$/i){
            $valid = 0;
            last;
        }
    }
    return $valid;
}

请注意,您还可以检查匹配本身的有效性,以及@sumArgs>设置$valid时为0。使用您的输入在perl中进行测试:

my @testInput;

push(@testInput,'=SUM(A1,A11:A212,A12:A56,A342:A12,A3)');
push(@testInput,'=SUM(A11:A12,A12:a12,A34:A3)');
push(@testInput,'=SUM(A1,A2,A3)');
push(@testInput,'=SUM(A1)');
push(@testInput,'=SUM(A11:A212:A2,A12:A56,A4,A342:A12)');

foreach(@testInput){
    print "'$_'\n  ";
    print 'NOT ' if !IsFormulaValid($_);
    print "VALID\n\n";
}

结果:

'=SUM(A1,A11:A212,A12:A56,A342:A12,A3)'
  VALID

'=SUM(A11:A12,A12:a12,A34:A3)'
  VALID

'=SUM(A1,A2,A3)'
  VALID

'=SUM(A1)'
  VALID

'=SUM(A11:A212:A2,A12:A56,A4,A342:A12)'
  NOT VALID