从NSString中删除HTML标签等

时间:2011-05-29 21:28:58

标签: iphone objective-c

  

可能重复:
  Remove HTML Tags from an NSString on the iPhone

我想知道从NSString中删除所有HTML / Javascript等标签的最佳方法。

我正在使用的当前解决方案留下评论和其他标签,删除它们的最佳方法是什么?

我知道解决方案,例如LibXML,但我想要一些例子来使用。

目前的解决方案:

- (NSString *)flattenHTML:(NSString *)html trimWhiteSpace:(BOOL)trim {

    NSScanner *theScanner;
    NSString *text = nil;

    theScanner = [NSScanner scannerWithString:html];

    while ([theScanner isAtEnd] == NO) {

        // find start of tag
        [theScanner scanUpToString:@"<" intoString:NULL] ;                 
        // find end of tag         
        [theScanner scanUpToString:@">" intoString:&text] ;

        // replace the found tag with a space
        //(you can filter multi-spaces out later if you wish)
        html = [html stringByReplacingOccurrencesOfString:
                [ NSString stringWithFormat:@"%@>", text]
                                               withString:@""];
    }

    // trim off whitespace
    return trim ? [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] : html;  
}

1 个答案:

答案 0 :(得分:17)

尝试使用此方法从字符串中删除HTML标记:

- (NSString *)stripTags:(NSString *)str
{
    NSMutableString *html = [NSMutableString stringWithCapacity:[str length]];

    NSScanner *scanner = [NSScanner scannerWithString:str];
    scanner.charactersToBeSkipped = NULL;
    NSString *tempText = nil;

    while (![scanner isAtEnd])
    {
        [scanner scanUpToString:@"<" intoString:&tempText];

        if (tempText != nil)
            [html appendString:tempText];

        [scanner scanUpToString:@">" intoString:NULL];

        if (![scanner isAtEnd])
            [scanner setScanLocation:[scanner scanLocation] + 1];

        tempText = nil;
    }

    return html;
}