这可能有点令人困惑,因为我对Objective-C有点陌生。我的应用程序已经获取了源代码:
NSURL *URL = [NSURL URLWithString:@"google.com"];
NSString *webData= [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:nil];
那可以正确获取源代码,我已经记录并检查了。我只想查找该字符串中的链接,所以所有带有关键字的内容:
<a href
我尝试搜索字符串,如下所示:
if ([webData containsString:@"<a href="]) {
NSLog(@"string contains!");
} else {
NSLog(@"string does not contain");
}
它总是返回负数,我不明白为什么。我只想获取包含链接的代码行并将这些行设置为新字符串。该字符串将包含源上的所有链接,但是我不知道该怎么做。希望我能提供足够的信息,如果您对我的问题有任何疑问,请询问。谢谢。
编辑1 我尝试了给出的答案,这是我的以下代码
NSURL *URL = [NSURL URLWithString:@"google.com"];
NSString *webData= [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:nil];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\<a href=\"(.*)\".*<\/a\>"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSUInteger numberOfMatches = [regex matchesInString:webData
options:0
range:NSMakeRange(0, [webData length])];
首先它不起作用,并且出现以下错误/警告:warnings
编辑2 我已经尝试修复代码,目前是
NSURL *URL = [NSURL URLWithString:@"google.com"];
NSString *webData= [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:nil];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\<a.+?\\>.+?\\<\\/a\\>"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray *matches = [regex matchesInString:webData
options:0
range:NSMakeRange(0, [webData length])];
NSLog(@"%@", matches);
这是正在输出的日志:
2018-11-05 00:12:51.144009-0500 InjectionTest[42684:6739102] (
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b2c00>{25654, 124}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b2cc0>{38864, 316}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b2340>{39939, 105}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b2100>{40051, 103}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b2000>{40203, 125}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b2140>{41190, 91}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b0f00>{41297, 67}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x6000037b2d80>{41479, 124}{<NSRegularExpression: 0x600002ca0210> \\<a.+?\\>.+?\\<\\/a\\> 0x1}"
)
我很确定那不是我应该得到的。
答案 0 :(得分:0)
我建议改用NSRegularExpression
:
通过使用适当的模式,例如:
\<a href=\"(.*)\".*<\/a\>
具有以下功能:
matchesInString:options:range:
您将在HTML字符串中获得A
元素的列表。
更多详细信息,请在Apple官方文档中阅读:
https://developer.apple.com/documentation/foundation/nsregularexpression?language=objc
** 更新 **
从HTML文本中提取所有<a>
元素的示例代码:
NSURL *url = [NSURL URLWithString:@"https://www.google.com"];
NSString *html = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\<a.+?\\>.+?\\<\\/a\\>"
options:NSRegularExpressionCaseInsensitive
error:nil];
NSArray *matches = [regex matchesInString:html options:0 range:NSMakeRange(0, html.length)];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *matchedString = [html substringWithRange:matchRange];
NSLog(@"%@", matchedString);
}
这是上面代码的日志:
2018-11-05 14:30:01.252702 TestArray[1322:320814] <a class=gb1 href="https://www.google.co.jp/imghp?hl=ja&tab=wi">画像</a>
2018-11-05 14:30:01.252911 TestArray[1322:320814] <a class=gb1 href="https://maps.google.co.jp/maps?hl=ja&tab=wl">マップ</a>
2018-11-05 14:30:01.253120 TestArray[1322:320814] <a class=gb1 href="https://play.google.com/?hl=ja&tab=w8">Play</a>
2018-11-05 14:30:01.253346 TestArray[1322:320814] <a class=gb1 href="https://www.youtube.com/?gl=JP&tab=w1">YouTube</a>
2018-11-05 14:30:01.253512 TestArray[1322:320814] <a class=gb1 href="https://news.google.co.jp/nwshp?hl=ja&tab=wn">ニュース</a>
2018-11-05 14:30:01.253638 TestArray[1322:320814] <a class=gb1 href="https://mail.google.com/mail/?tab=wm">Gmail</a>
2018-11-05 14:30:01.253750 TestArray[1322:320814] <a class=gb1 href="https://drive.google.com/?tab=wo">ドライブ</a>
2018-11-05 14:30:01.253934 TestArray[1322:320814] <a class=gb1 style="text-decoration:none" href="https://www.google.co.jp/intl/ja/options/"><u>もっと見る</u> »</a>
2018-11-05 14:30:01.254049 TestArray[1322:320814] <a href="http://www.google.co.jp/history/optout?hl=ja" class=gb4>ウェブ履歴</a>
2018-11-05 14:30:01.254164 TestArray[1322:320814] <a href="/preferences?hl=ja" class=gb4>設定</a>
2018-11-05 14:30:01.254274 TestArray[1322:320814] <a target=_top id=gb_70 href="https://accounts.google.com/ServiceLogin?hl=ja&passive=true&continue=https://www.google.com/" class=gb4>ログイン</a>
2018-11-05 14:30:01.254434 TestArray[1322:320814] <a href="/advanced_search?hl=ja&authuser=0">検索オプション</a>
2018-11-05 14:30:01.254739 TestArray[1322:320814] <a href="/language_tools?hl=ja&authuser=0">言語ツール</a>
2018-11-05 14:30:01.254900 TestArray[1322:320814] <a href="https://www.google.com/setprefs?sig=0_hs-qGLtJFycvdIdXbi2jQdSOY4s%3D&hl=en&source=homepage&sa=X&ved=0ahUKEwi_tb7pwrzeAhULwLwKHZ9aDiAQ2ZgBCAU">English</a>
2018-11-05 14:30:01.255072 TestArray[1322:320814] <a href="/intl/ja/ads/">広告掲載</a>
2018-11-05 14:30:01.255182 TestArray[1322:320814] <a href="http://www.google.co.jp/intl/ja/services/">ビジネス ソリューション</a>
2018-11-05 14:30:01.255453 TestArray[1322:320814] <a href="https://plus.google.com/115899767381375908215" rel="publisher">+Google</a>
2018-11-05 14:30:01.255609 TestArray[1322:320814] <a href="/intl/ja/about.html">Google について</a>
2018-11-05 14:30:01.255722 TestArray[1322:320814] <a href="https://www.google.com/setprefdomain?prefdom=JP&prev=https://www.google.co.jp/&sig=K_erqW_iZ2bjJu2TsKii5UfNnAGcg%3D">Google.co.jp</a>
2018-11-05 14:30:01.255832 TestArray[1322:320814] <a href="/intl/ja/policies/privacy/">プライバシー</a>
2018-11-05 14:30:01.256001 TestArray[1322:320814] <a href="/intl/ja/policies/terms/">規約</a>