我正在加载整个HTML页面,希望获取特定标记之间的所有内容。为此,我正在做:
articleXpathQueryString = @"//article/div[@class='entry breadtext']";
articleNodes = [articleParser searchWithXPathQuery:articleXpathQueryString];
item.content = [self recursiveHTMLIterator:articleNodes content:@""];
然后我有一个递归函数,它试图总结所有子节点的内容及其HTML标记:
-(NSString*) recursiveHTMLIterator:(NSArray*)elementArray content:(NSString*)content {
for(TFHppleElement *element in elementArray) {
if(![element hasChildren]) {
//The element has no children
} else {
//The element has children
NSString *tmpStr = [[element firstChild] content];
if(tmpStr != nil) {
NSString *css = [element tagName];
content = [content stringByAppendingString:[self createOpenTag:css]];
content = [content stringByAppendingString:tmpStr];
content = [content stringByAppendingString:[self createCloseTag:css]];
}
NSString *missingStr = [[element firstTextChild] content];
if(![missingStr isEqualToString:tmpStr]) {
if(missingStr != nil) {
NSString *css= [element tagName];
content = [content stringByAppendingString:[self createOpenTag:css]];
content = [content stringByAppendingString:missingStr];
content = [content stringByAppendingString:[self createCloseTag:css]];
}
}
content = [self recursiveHTMLIterator:element.children content:content];
}
}
return content;
}
然而,即使结果在某种程度上令人满意,但当HTML格式如下时,它不会获取img标签并且会有点混乱:
<p>
<strong>-</strong>
This text is not parsed because it skips it after it acquires <strong>-</strong>, this is why I have the second if-statement which catches up "missing strings", but they are inserted in the wrong order
</p>
所以我的问题是,我是否应继续尝试正确解析递归方法,或者是否有更简单的方法来获取所需的HTML(然后我在Web视图中使用)。我正在寻找的是所有内容
<article> THIS </article>.
换句话说,我想用TFHpple做这样的事情(尽管代码不起作用):
articleXpathQueryString = @"//article/div[@class='entry breadtext']";
articleNodes = [articleParser searchWithXPathQuery:articleXpathQueryString];
item.content = [articleParser allContentAsString]; //I simply want everything in articleParser in a string format
答案 0 :(得分:0)
好的,我终于明白了......我希望如果有人像我一样愚蠢,这会有所帮助:
所有需要做的就是将URL加载到webview中,然后简单地执行一个简单的javascript查询,如下所示(在webViewDidFinishLoad中):
NSString *bread_text = [webView stringByEvaluatingJavaScriptFromString:@"document.getElementsByClassName('entry breadtext')[0].innerHTML"];
获取众所周知的课程中的所有内容。现在我需要弄清楚如何加载它而不首先显示webview,但这似乎比迭代XML结构更容易:)