在Objective-C中删除脚本和样式标记之间的内容

时间:2013-07-21 20:51:27

标签: objective-c tags download

好吧,所以我正在开发一个网页抓取工具,可以将网页转换为文本段落。要删除标签本身,我在Stack Overflow上找到了这个:

- (NSString *) stripTags:(NSString *)str
{
    NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];

    NSScanner *scanner = [NSScanner scannerWithString:str];
    [scanner setCharactersToBeSkipped:nil];
    NSString *s = nil;
    while (![scanner isAtEnd])
    {
        [scanner scanUpToString:@"<" intoString:&s];
        if (s != nil)
            [ms appendString:s];
        [scanner scanUpToString:@">" intoString:NULL];
        if (![scanner isAtEnd])
            [scanner setScanLocation:[scanner scanLocation]+1];
        s = nil;
    }

    return ms;
}

然而,它有效,它只删除了标签,而不是脚本和样式标签之间的内容(显然我不希望删除所有标签之间的内容,因为这会导致在空字符串中。)

有什么方法可以让脚本和样式标签被截断?

提前多多感谢。

编辑:

我尝试将代码更改为:

- (NSString *) stripTags:(NSString *)str
{
    NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];

    NSScanner *scanner = [NSScanner scannerWithString:str];
    [scanner setCharactersToBeSkipped:nil];
    NSString *s = nil;
    while (![scanner isAtEnd])
    {
        [scanner scanUpToString:@"<script" intoString:&s];
        if (s != nil)
            [ms appendString:s];
        [scanner scanUpToString:@"script>" intoString:NULL];
        if (![scanner isAtEnd])
            [scanner setScanLocation:[scanner scanLocation]+1];
        [scanner scanUpToString:@"<" intoString:&s];
        if (s != nil)
            [ms appendString:s];
        [scanner scanUpToString:@">" intoString:NULL];
        if (![scanner isAtEnd])
            [scanner setScanLocation:[scanner scanLocation]+1];
        s = nil;
    }

    return ms;
}

但脚本和css仍在包含

1 个答案:

答案 0 :(得分:1)

您可以编辑扫描仪代码,以便检查标签。如果标签是您要删除的标签,则可以扫描到结束标签,然后丢弃该字符串。那么你就不能存储/附加字符串。


阅读标签start(<)',然后阅读标签,以便检查它是什么。然后读取标签关闭并放下或保存。


从类似内容开始(以内联方式输入,不以任何方式测试):

while (![scanner isAtEnd])
{
    [scanner scanUpToString:@"<" intoString:&s];
    if (s != nil)
        [ms appendString:s];
    [scanner scanUpToString:@">" intoString:&t];
    if ([t isEqualToString:@"tagToIgnore"]) {
        [scanner scanUpToString:@"<" intoString:NULL];
        [scanner setScanLocation:[scanner scanLocation]-1];
        s = nil;
        t = nil;
        continue;
    }
    if (![scanner isAtEnd])
        [scanner setScanLocation:[scanner scanLocation]+1];
    s = nil;
    t = nil;
}