如何从HTML中提取图像并在标题标记前移动?

时间:2014-07-06 13:35:09

标签: html asp.net-mvc vb.net html-agility-pack

我对这个有点难过。我有一些html,其中有一个图像,后面跟着一些文字。但是我需要重新排列html以便首先显示图像 - 所以图像,然后是h3标签,然后是文本。

编辑:下面的鳕鱼实际上并没有删除样式属性。我认为它一直在工作,直到我更仔细地查看html源代码。所以我需要帮助剥离给定的样式属性

<p>
<img alt="" src="../../../../images/PeterDoocy5.jpg" style="width: 608px; height: 316px;" /></p>

到目前为止,我已设法使用HAP删除页面中图像的样式属性:

 <Extension()> Public Function RemoveStyleAttributes(input As String)
        Dim cleint As New WebClient

        Dim html As New HtmlDocument
        html.LoadHtml(input)

        Dim elementsWithStyleAttribute = html.DocumentNode.SelectNodes("//@img")

        If elementsWithStyleAttribute IsNot Nothing Then
            For Each element In elementsWithStyleAttribute
                element.Attributes("style").Remove()
            Next
        End If
        Return input
    End Function

但我不知道如何将图像拉到H3标签前面。

HTML:

<div class="col-md-6">
   <div class="item">
      <div class="content galleryItem">
         <h3>
            DOJ court docs in Abu Khattallah case dispel Obama Admin narrative about the anti-Islam video                            
         </h3>
         <p>
            <img alt="" class="img-responsive" src="../../../../images/AbuKhattala.jpg" />
         </p>
         <p>
            But it was an awful, disgusting video.....
         </p>
      </div>
   </div>
</div>

现在扩展方法:

   <Extension()> Public Function RemoveStyleAttributes(html As HtmlDocument)


        Dim divs = html.DocumentNode.SelectNodes("//div[@class='content galleryItem']")

        For Each div As HtmlNode In divs
            'get <img> and remove its style attribute'
            Dim img = div.SelectSingleNode("./p/img[@style]")
            img.Attributes("style").Remove()
            'remove <h3> and <p>text here</p>'
            Dim h3 = div.SelectSingleNode("./h3")
            h3.Remove()
            Dim text = div.SelectSingleNode("./p[not(img)]")
            text.Remove()
            'add <h3> and <p>text here</p> to the parent again in desired order'
            div.AppendChild(h3)
            div.AppendChild(text)
        Next


        Return html.DocumentNode.OuterHtml.ToString
    End Function

尝试将其用作@ Html.Raw(item.PostSummary.RemoveStyleAttributes)

1 个答案:

答案 0 :(得分:0)

您可以尝试这种方式:

<Extension()> Public Function RemoveStyleAttributes(input As String)
    Dim cleint As New WebClient
    Dim html As New HtmlDocument
    html.LoadHtml(input)

    For Each div As HtmlNode In divs
        'get <img> and remove its style attribute'
        Dim img = div.SelectSingleNode("./p/img[@style]")
        img.Attributes("style").Remove()
        'remove <h3> and <p>text here</p>'
        Dim h3 = div.SelectSingleNode("./h3")
        h3.Remove()
        Dim text = div.SelectSingleNode("./p[not(img)]")
        text.Remove()
        'add <h3> and <p>text here</p> to the parent again in desired order'
        div.AppendChild(h3)
        div.AppendChild(text)
    Next
    Return html.DocumentNode.OuterHtml.ToString
End Function

输出(格式化。此问题中发布的给定输入html)

<div class="col-md-6">
   <div class="item">
      <div class="content galleryItem">
         <p>
            <img alt="" class="img-responsive" src="../../../../images/AbuKhatta
               la.jpg">
         </p>
         <h3>
            DOJ court docs in Abu Khattallah case dispel Obama Admin narrative a
            bout the anti-Islam video
         </h3>
         <p>
            But it was an awful, disgusting video.....
         </p>
      </div>
   </div>
</div>