使用NSRegularExpression在iPhone上提取URL

时间:2012-03-06 16:09:22

标签: objective-c regex nsstring nsarray nsregularexpression

我在我的iPhone应用程序中使用以下代码,取自here以从条带.html代码中提取所有网址。

我只能提取第一个网址,但我需要一个包含所有网址的数组。我的NSArray没有为每个URL返回NSStrings,只返回对象描述。

如何让我的arrayOfAllMatches返回所有网址,如NSStrings?

-(NSArray *)stripOutHttp:(NSString *)httpLine {

// Setup an NSError object to catch any failures
NSError *error = NULL;  

// create the NSRegularExpression object and initialize it with a pattern
// the pattern will match any http or https url, with option case insensitive

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?" options:NSRegularExpressionCaseInsensitive error:&error];

// create an NSRange object using our regex object for the first match in the string httpline
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];

NSArray *arrayOfAllMatches = [regex matchesInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];

// check that our NSRange object is not equal to range of NSNotFound
if (!NSEqualRanges(rangeOfFirstMatch, NSMakeRange(NSNotFound, 0))) {
    // Since we know that we found a match, get the substring from the parent string by using our NSRange object

    NSString *substringForFirstMatch = [httpLine substringWithRange:rangeOfFirstMatch];

    NSLog(@"Extracted URL: %@",substringForFirstMatch);
    NSLog(@"All Extracted URLs: %@",arrayOfAllMatches);

    // return all matching url strings
    return arrayOfAllMatches;
}

return NULL;

}

这是我的NSLog输出:

Extracted URL: http://example.com/myplayer    
All Extracted URLs: (
    "<NSExtendedRegularExpressionCheckingResult: 0x106ddb0>{728, 53}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
    "<NSExtendedRegularExpressionCheckingResult: 0x106ddf0>{956, 66}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
    "<NSExtendedRegularExpressionCheckingResult: 0x106de30>{1046, 63}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
    "<NSExtendedRegularExpressionCheckingResult: 0x106de70>{1129, 67}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}"
)

5 个答案:

答案 0 :(得分:21)

方法matchesInString:options:range:返回NSTextCheckingResult个对象的数组。您可以使用快速枚举来遍历数组,从原始字符串中提取每个匹配的子字符串,并将子字符串添加到新数组中。

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?" options:NSRegularExpressionCaseInsensitive error:&error];

NSArray *arrayOfAllMatches = [regex matchesInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];

NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];

for (NSTextCheckingResult *match in arrayOfAllMatches) {    
    NSString* substringForMatch = [httpLine substringWithRange:match.range];
    NSLog(@"Extracted URL: %@",substringForMatch);

    [arrayOfURLs addObject:substringForMatch];
}

// return non-mutable version of the array
return [NSArray arrayWithArray:arrayOfURLs];

答案 1 :(得分:14)

尝试NSDataDetector

NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:text options:0 range:NSMakeRange(0, [text length])];

答案 2 :(得分:8)

NSDataDetector使用Swift:

let types: NSTextCheckingType = .Link
var error : NSError?

let detector = NSDataDetector(types: types.rawValue, error: &error)        
var matches = detector!.matchesInString(text, options: nil, range: NSMakeRange(0, count(text)))

for match in matches {
   println(match.URL!)
}
  

使用Swift 2.0:

let text = "http://www.google.com. http://www.bla.com"
let types: NSTextCheckingType = .Link

let detector = try? NSDataDetector(types: types.rawValue)

guard let detect = detector else {
   return
}

let matches = detect.matchesInString(text, options: .ReportCompletion, range: NSMakeRange(0, text.characters.count))

for match in matches {
   print(match.URL!)
}
  

使用Swift 3.0

let text = "http://www.google.com. http://www.bla.com"
let types: NSTextCheckingResult.CheckingType = .link

let detector = try? NSDataDetector(types: types.rawValue)

let matches = detector?.matches(in: text, options: .reportCompletion, range: NSMakeRange(0, text.characters.count))

for match in matches! {
   print(match.url!)
}

答案 3 :(得分:5)

从给定字符串中获取所有链接

NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:@"(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))" options:NSRegularExpressionCaseInsensitive error:NULL];
NSString *someString = @"www.facebook.com/link/index.php This is a sample www.google.com of a http://abc.com/efg.php?EFAei687e3EsA sentence with a URL within it.";

NSArray *matches = [expression matchesInString:someString options:NSMatchingCompleted range:NSMakeRange(0, someString.length)];
for (NSTextCheckingResult *result in matches) {
        NSString *url = [someString substringWithRange:result.range];
        NSLog(@"found url:%@", url);
}

答案 4 :(得分:2)

我发现自己对这个简单操作(“匹配所有子串”)的复杂性感到恶心,我创建了一个小库,我谦虚地调用Unsuck,这为NSRegularExpression增加了一些理智。 fromallMatches方法的形式。以下是您使用它们的方式:

NSRegularExpression *re = [NSRegularExpression from: @"(?i)\\b(https?://.*)\\b"]; // or whatever your favorite regex is; Hossam's seems pretty good
NSArray *matches = [re allMatches:httpLine];

check out the unsuck source code on github并告诉我所有错误的事情: - )

请注意(?i)使其不区分大小写,因此您无需指定NSRegularExpressionCaseInsensitive