为什么NSRegularExpression在所有情况下都不尊重捕获组?

时间:2011-09-29 15:59:08

标签: objective-c regex nsregularexpression

主要问题:当我的模式为@"\\b(\\S+)\\b"时,ObjC可以告诉我有六个匹配,但当我的模式为@"A b (c) or (d)"时,它只会报告一个匹配"c"

解决方案

这是一个将捕获组作为NSArray返回的函数。我是一个Objective C新手,所以我怀疑有更好的方法来做笨重的工作,而不是通过创建一个可变数组并在最后将它分配给NSArray。

- (NSArray *)regexWithResults:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSArray *ar;
    ar = [[NSArray alloc] init];
    NSError *error = NULL;
    NSArray *arTextCheckingResults;
    NSMutableArray *arMutable = [[NSMutableArray alloc] init];
    NSRegularExpression *regex = [NSRegularExpression
        regularExpressionWithPattern:strPattern
        options:NSRegularExpressionSearch error:&error];

    arTextCheckingResults = [regex matchesInString:haystack
        options:0
        range:NSMakeRange(0, [haystack length])];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        int captureIndex;
        for (captureIndex = 1; captureIndex < ntcr.numberOfRanges; captureIndex++) {
            NSString * capture = [haystack substringWithRange:[ntcr rangeAtIndex:captureIndex]];
            //NSLog(@"Found '%@'", capture);
            [arMutable addObject:capture];
        }
    }

    ar = arMutable;
    return ar;
}

问题

我习惯使用括号以这样的方式匹配Perl中的捕获组:

#!/usr/bin/perl -w
use strict;

my $str = "This sentence has words in it.";
if(my ($what, $inner) = ($str =~ /This (\S+) has (\S+) in it/)) {
    print "That $what had '$inner' in it.\n";
}

该代码将产生:

    That sentence had 'words' in it.

但是在Objective C中,使用NSRegularExpression,我们得到了不同的结果。示例功能:

- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSError *error = NULL;
    NSArray *arTextCheckingResults;

    NSRegularExpression *regex = [NSRegularExpression
                                  regularExpressionWithPattern:strPattern
                                  options:NSRegularExpressionSearch
                                  error:&error];

    NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];

    NSLog(@"Pattern: '%@'", strPattern);
    NSLog(@"Search text: '%@'", haystack);
    NSLog(@"Number of matches: %lu", numberOfMatches);

    arTextCheckingResults = [regex matchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
        NSLog(@"Found string '%@'", match);
    }
}

调用该测试函数,结果显示它能够计算字符串中的单词数:

NSString *searchText = @"This sentence has words in it.";
[myClass regexTest:searchText pattern:@"\\b(\\S+)\\b"];
    Pattern: '\b(\S+)\b'
    Search text: 'This sentence has words in it.'
    Number of matches: 6
    Found string 'This'
    Found string 'sentence'
    Found string 'has'
    Found string 'words'
    Found string 'in'
    Found string 'it'

但是如果捕获组是明确的,那会是什么呢?

[myClass regexTest:searchText pattern:@".*This (sentence) has (words) in it.*"];

结果:

    Pattern: '.*This (sentence) has (words) in it.*'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

与上述相同,但使用\ S +而不是实际的单词:

[myClass regexTest:searchText pattern:@".*This (\\S+) has (\\S+) in it.*"];

结果:

    Pattern: '.*This (\S+) has (\S+) in it.*'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

中间的通配符怎么样?

[myClass regexTest:searchText pattern:@"^This (\\S+) .* (\\S+) in it.$"];

结果:

    Pattern: '^This (\S+) .* (\S+) in it.$'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

参考文献: NSRegularExpression NSTextCheckingResult NSRegularExpression matching options

2 个答案:

答案 0 :(得分:7)

我想如果你改变了

// returns the range which matched the pattern
NSString *match = [haystack substringWithRange:ntcr.range];

// returns the range of the first capture
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];

对于包含单个捕获的模式,您将获得预期结果。

请参阅NSTextCheckingResult的文档页面:rangeAtIndex:

结果必须至少有一个范围,但可以选择包含更多范围(例如,表示正则表达式捕获组)。

传递rangeAtIndex:值0始终返回range属性的值。其他范围(如果有)将具有从1到numberOfRanges-1的索引。

答案 1 :(得分:1)

更改NSTextCheckingResult

- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSError *error = NULL;
    NSArray *arTextCheckingResults;

    NSRegularExpression *regex = [NSRegularExpression
                                  regularExpressionWithPattern:strPattern
                                  options:NSRegularExpressionSearch
                                  error:&error];
    NSRange stringRange = NSMakeRange(0, [haystack length]);
    NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack
                                                        options:0 range:stringRange];

    NSLog(@"Number of matches for '%@' in '%@': %u", strPattern, haystack, numberOfMatches);

    arTextCheckingResults = [regex matchesInString:haystack options:NSRegularExpressionCaseInsensitive range:stringRange];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        NSRange matchRange = [ntcr rangeAtIndex:1];
        NSString *match = [haystack substringWithRange:matchRange];
        NSLog(@"Found string '%@'", match);
    }
}

NSLog输出:
找到字符串'words'