Question

在我的应用中，我有一些网页的网址＆amp;我希望通过识别html标签来获得该URL（其html页面）中的特定部分/部分。

例如，我希望根据html源代码获得<div id="content"> to </div>部分。这样我就可以将它保存在另一个文件中。

e.g。我的网址是 http://www.makepartsfast.com/2012/09/4337/more-3d-printing-in-metals-ex-one-introduces-the-m-flex-3d-printing-system/，会打开一个html页面。我只想要那个页面的一部分。

我该怎么做？

谢谢..

Answer 1

这是Cocoa + NSString解决方案（工作+测试）。当你使用这样的自定义解析器时，你会发现唯一真正的技巧是如何找到“结束”点。实际上，你不能只是去“”，因为其他div在中间打开，所以你的解析器会在你想要的结束之前停止。我显然没有说没有其他方法可以做到这一点，使用更复杂的XML解析器。但是网页不是那么容易解析，它们的代码并不总是完美的......这很简单且有效（你还应该考虑另一种获取URL内容的方法而不是stringWithContentsOfURL：这不是异步的）：

NSString *originalString = [NSString stringWithContentsOfURL:[NSURL URLWithString:@"http://www.makepartsfast.com/2012/09/4337/more-3d-printing-in-metals-ex-one-introduces-the-m-flex-3d-printing-system/"] encoding:NSUTF8StringEncoding error:nil];

NSScanner *scanner = [NSScanner scannerWithString:originalString];
NSString *extractedString = nil;

[scanner scanUpToString:@"<div id=\"content\">" intoString:nil];
[scanner scanString:@"<div id=\"content\">" intoString:nil];

[scanner scanUpToString:@"<div style=\"clear:both;\">" intoString:&extractedString];

if (extractedString) 
{
    // string was extracted
    NSLog(@"%@", extractedString);
}

Answer 2

查看Raywenderlich的教程How to Parse HTML on iOS。希望这对您有所帮助。

http://www.raywenderlich.com/14172/how-to-parse-html-on-ios

Answer 3

您可以使用开源库GDataXMLNode。它允许您操作xml文件。看看

http://www.raywenderlich.com/725/how-to-read-and-write-xml-documents-with-gdataxml http://code.google.com/p/gdata-objectivec-client/source/browse/trunk/Source/XMLSupport/?r=129

Answer 4

您最好的选择是使用NSXMLParser搜索属性div等于“内容”的id代码。捕获所有中间内容，直到相应的</div>结束标记。请参阅Apple's tutorial。

如何使用iPhone SDK只从网址获取文本的特定部分？

4 个答案: