如何从HTML分割标签

时间:2018-12-05 12:30:02

标签: c# tags

我有非常简单的HTML文本。在这里,我只希望将图像附加到其他地方。如何使用c#单独剪切图像标签。

<p>this is new document<img alt="" height="150" src="https://kuba2storage.blob.core.windows.net/kuba-appid-1/manual-1203/images/desert-20180824203530071.jpg" width="200"/>This is new document</p>

我想从这些数据中单独获取img标签。例如

<img alt="" height="150" src="https://kuba2storage.blob.core.windows.net/kuba-appid-1/manual-1203/images/desert-20180824203530071.jpg" width="200"/>

代码:

var parts = Regex.Split(text.Text, @"(<img>[\s\S]+?<\/img>)").Where(l => l != string.Empty).ToArray();

2 个答案:

答案 0 :(得分:0)

您可以尝试使用HtmlAgilityPack之类的第三方库,他们在Example page上有一些不错的示例,例如

using System;
using HtmlAgilityPack;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        GetLinks();
    }

    private static void GetLinks()
        {
            HtmlAgilityPack.HtmlWeb hw = new HtmlAgilityPack.HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = hw.Load("https://www.ynet.co.il/home/0,7340,L-8,00.html");
            List<string> htmls = new List<string>();
            foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//img"))
            {
                string hrefValue = link.GetAttributeValue("src", string.Empty);
                htmls.Add(hrefValue);
            }
        foreach(var item in htmls){
        Console.WriteLine(item);
        }
        if(doc.DocumentNode.SelectNodes("//a[@href]")==null){
        Console.WriteLine("no links");
        }
        }
}

这可以在https://dotnetfiddle.net/QAZnDz上找到,也可以使用linq来过滤图像等。

答案 1 :(得分:0)

您可以尝试在下面使用

using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void  Main(string[] args)
        {
             string data = "<p>this is new document<img alt='' height='150' src='https://kuba2storage.blob.core.windows.net/kuba-appid-1/manual-1203/images/desert-20180824203530071.jpg' width='200'/>This is new document</p>";
             var newdt = FetchImgsFromSource(data);

        }
    }
    public static List<string> FetchImgsFromSource(string htmlSource)
    {
        List<string> listOfImgdata = new List<string>();
        string regexImgSrc = @"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";
        var matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
        foreach (Match m in matchesImgSrc)
        {
            string href = m.Groups[1].Value;
            listOfImgdata.Add(href);
        }
        return listOfImgdata;
    }
}

enter image description here