将HTML转换为格式正确的referencedString

时间:2014-02-24 10:56:09

标签: ios nsattributedstring

我需要将包含<h2>..</h2><p>..</p><a href=".."><img ..></a>元素的HTML数据转换为具有正确格式的attributedString。我想将<h2>分配给UIFontTextStyleHeadline1<p>分配给UIFontTextStyleBody并存储图片链接。我需要输出为仅带有标题和body元素的referencedString,我将分别处理图像。

到目前为止,我有这段代码:

NSMutableAttributedString *content = [[NSMutableAttributedString alloc] 
         initWithData:[[post objectForKey:@"content"] 
    dataUsingEncoding:NSUTF8StringEncoding] 
              options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                   NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]}
   documentAttributes:nil error:nil];

输出到这样的东西:

Heading
{
    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSFont = "<UICTFont: 0xd47bc00> font-family: \"TimesNewRomanPS-BoldMT\"; font-weight: bold; font-style: normal; font-size: 18.00pt";
    NSKern = 0;
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 14.94, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 2";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSStrokeWidth = 0;
}{
    NSAttachment = "<NSTextAttachment: 0xd486590>";
    NSColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
    NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
    NSKern = 0;
    NSLink = "http://www.placeholder.com/image.jpg";
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
    NSStrokeWidth = 0;
}
Body text, body text, body text. Body text, body text, body text.
{
    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
    NSKern = 0;
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSStrokeWidth = 0;
}

我是attributesString的新手,并寻求一种将这些属性转换为上述标准字体的有效方法。谢谢。

1 个答案:

答案 0 :(得分:0)

如果有人会寻求类似的东西,我最终使用TFHpple librabry将图像与HTML数据中的文本元素分开,然后我改变了attributesString的格式属性,如下所示:

NSString *contentString = [self parseHTMLdata:bodyString];

NSMutableAttributedString *content = [[NSMutableAttributedString alloc] initWithData:[contentString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];

// prepare new format
NSRange effectiveRange = NSMakeRange(0, 0);

NSDictionary *attributes;

while (NSMaxRange(effectiveRange) < [content length]) {

attributes = [content attributesAtIndex:NSMaxRange(effectiveRange) effectiveRange:&effectiveRange];

    UIFont *font = [attributes objectForKey:@"NSFont"];

    if (font.pointSize == 18.0f) {

        [content addAttribute:NSFontAttributeName value:self.headlineFont range:effectiveRange];

    } else {

        [content addAttribute:NSFontAttributeName value:self.bodyFont range:effectiveRange];
    }
}

和hpple部分:

- (NSString *)parseHTMLdata:(NSString *)content
{
    NSData *data = [content dataUsingEncoding:NSUTF8StringEncoding];

    TFHpple *parser = [[TFHpple alloc] initWithHTMLData:data];

    NSString *xpathQueryString = @"//body";

    NSArray *elements = [[[parser searchWithXPathQuery:xpathQueryString] firstObject] children];

    NSMutableString *textContent = [[NSMutableString alloc] init];

    for (TFHppleElement *element in elements) {

        if ([[element tagName] isEqualToString:@"h2"] || [[element tagName] isEqualToString:@"p"]) {

            if ([[[element firstChild] tagName] isEqualToString:@"a"]) {

                // image element, just save it in array
            } else {

                // pure h2 or p element
                [textContent appendString:[element raw]];
            }
        }
    }

    return textContent;
}

检查属性中的字体大小可能看起来很脆弱,如果它会导致一些问题我可以深入挖掘包含标题/正文标记的段落样式。