改进在文本正文中查找URL的算法 - obj-c

时间:2010-12-29 17:30:16

标签: objective-c algorithm url nsrange

我正在尝试提出一种算法来查找文本正文中的URL。我目前有以下代码(这是我坐下来破解代码,我知道必须有更好的方法):

    statusText.text = @"http://google.com http://www.apple.com www.joshholat.com";

NSMutableArray *urlLocations = [[NSMutableArray alloc] init];

NSRange currentLocation = NSMakeRange(0, statusText.text.length);
for (int x = 0; x < statusText.text.length; x++) {
    currentLocation = [[statusText.text substringFromIndex:(x + currentLocation.location)] rangeOfString:@"http://"];
    if (currentLocation.location > statusText.text.length) break;
    [urlLocations addObject:[NSNumber numberWithInt:(currentLocation.location + x)]];
}
currentLocation = NSMakeRange(0, statusText.text.length);
for (int x = 0; x < statusText.text.length; x++) {
    currentLocation = [[statusText.text substringFromIndex:(x + currentLocation.location)] rangeOfString:@"http://www."];
    if (currentLocation.location > statusText.text.length) break;
    [urlLocations addObject:[NSNumber numberWithInt:(currentLocation.location + x)]];
}
currentLocation = NSMakeRange(0, statusText.text.length);
for (int x = 0; x < statusText.text.length; x++) {
    currentLocation = [[statusText.text substringFromIndex:(x + currentLocation.location)] rangeOfString:@" www." options:NSLiteralSearch];
    if (currentLocation.location > statusText.text.length) break;
    [urlLocations addObject:[NSNumber numberWithInt:(currentLocation.location + 1 + x)]];
}

//Get rid of any duplicate locations
NSSet *uniqueElements = [NSSet setWithArray:urlLocations];
[urlLocations release];
NSArray *finalURLLocations = [[NSArray alloc] init];
finalURLLocations = [uniqueElements allObjects];

//Parse out the URLs of each of the locations
for (int x = 0; x < [finalURLLocations count]; x++) {
    NSRange temp = [[statusText.text substringFromIndex:[[finalURLLocations objectAtIndex:x] intValue]] rangeOfString:@" "];
    int length = temp.location + [[finalURLLocations objectAtIndex:x] intValue];
    if (temp.location > statusText.text.length) length = statusText.text.length;
    length = length - [[finalURLLocations objectAtIndex:x] intValue];
    NSLog(@"URL: %@", [statusText.text substringWithRange:NSMakeRange([[finalURLLocations objectAtIndex:x] intValue], length)]);
}

我觉得可以通过使用正则表达式或其他东西来改进它。任何帮助改善这一点将非常感激。

2 个答案:

答案 0 :(得分:5)

如果您的目标是iOS 4.0+,您应该让Apple为您完成工作并使用内置数据检测器。使用NSDataDetector选项创建NSTextCheckingTypeLink的实例,并在字符串上运行它。 documentation for NSDataDetector在类的使用方面有一些很好的例子。

如果您因任何原因不能/无法使用数据检测器,John Gruber几个月前发布了一个很好的正则表达模式来检测URL:http://daringfireball.net/2010/07/improved_regex_for_matching_urls

答案 1 :(得分:1)

正如后续行动一样,以下是我的代码更改为:

    statusText.text = @"http://google.com http://www.apple.com www.joshholat.com hey there google.com";

NSError *error = NULL;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:&error];

NSArray *matches = [detector matchesInString:statusText.text
                                     options:0
                                       range:NSMakeRange(0, statusText.text.length)];

for (NSTextCheckingResult *match in matches) {
    if ([match resultType] == NSTextCheckingTypeLink) {
        NSLog(@"URL: %@", [[match URL] absoluteURL]);
    }
}