NSString删除html标签,保持<text in =“”angle =“”bracket =“”>

时间:2015-05-21 16:21:15

标签: html ios objective-c nsstring

如何从NSString中删除html标记,但保留任何<Text in angle brackets>

<p>123 <Hello> abc</p> - &gt; 123 <Hello> abc

我尝试了各种正则表达式,扫描程序和XML Parser解决方案,但它们删除了<Text in angle brackets>以及标记。

适合我的唯一解决方案是使用带有选项的NSAttributedString

NSAttributedString *str = [[NSAttributedString alloc] initWithData:utf8Data
                                                               options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                                                                         NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)}
                                                    documentAttributes:nil
                                                                 error:nil];

NSString *result = [str string];

但是这种方法使用WebKit并且为我的任务消耗了太多内存。

那么,如何从NSString中删除标签,保留<Text in angle brackets>而不使用任何类型的WebKit / UIWebView等等?

1 个答案:

答案 0 :(得分:1)

我刚问过similar question ma,可能有些答案可以帮到你。 如果您确实需要完整的HTML解析器并且只想删除HTML标记,那么NSString类别可能很有用(这个类别是由mwaterfal修改的类别):

- (NSString *)stringByStrippingTags {

    // Find first & and short-cut if we can
    NSUInteger ampIndex = [self rangeOfString:@"<" options:NSLiteralSearch].location;
    if (ampIndex == NSNotFound) {
        return [NSString stringWithString:self]; // return copy of string as no tags found
    }

    // Scan and find all tags
    NSScanner *scanner = [NSScanner scannerWithString:self];
    [scanner setCharactersToBeSkipped:nil];
    NSMutableSet *tags = [[NSMutableSet alloc] init];
    NSString *tag;
    do {
        // Scan up to <
        tag = nil;
        [scanner scanUpToString:@"<" intoString:NULL];
        [scanner scanUpToString:@">" intoString:&tag];

        if (tag) {
            NSString *t = [[NSString alloc] initWithFormat:@"%@>", tag];
            [tags addObject:t];
        }

    } while (![scanner isAtEnd]);
    NSMutableString *result = [[NSMutableString alloc] initWithString:self];
    NSString *finalString;

    NSString *replacement;
    for (NSString *t in tags) {
        replacement = @" ";
        if ([t isEqualToString:@"<a>"] ||
            [t isEqualToString:@"</a>"] ||
            [t isEqualToString:@"<span>"] ||
            [t isEqualToString:@"</span>"] ||
            [t isEqualToString:@"<strong>"] ||
            [t isEqualToString:@"</strong>"] ||
            [t isEqualToString:@"<em>"] ||
            [t isEqualToString:@"</em>"]) {
            replacement = @"";
        }
        [result replaceOccurrencesOfString:t
                            withString:replacement
                               options:NSLiteralSearch
                                 range:NSMakeRange(0, result.length)];
    }

    // Remove multi-spaces and line breaks
    return = [result stringByRemovingNewLinesAndWhitespace];
}