我有非常简单的HTML文本。在这里,我只希望将图像附加到其他地方。如何使用c#单独剪切图像标签。
<p>this is new document<img alt="" height="150" src="https://kuba2storage.blob.core.windows.net/kuba-appid-1/manual-1203/images/desert-20180824203530071.jpg" width="200"/>This is new document</p>
我想从这些数据中单独获取img标签。例如
<img alt="" height="150" src="https://kuba2storage.blob.core.windows.net/kuba-appid-1/manual-1203/images/desert-20180824203530071.jpg" width="200"/>
代码:
var parts = Regex.Split(text.Text, @"(<img>[\s\S]+?<\/img>)").Where(l => l != string.Empty).ToArray();
答案 0 :(得分:0)
您可以尝试使用HtmlAgilityPack之类的第三方库,他们在Example page上有一些不错的示例,例如
using System;
using HtmlAgilityPack;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
GetLinks();
}
private static void GetLinks()
{
HtmlAgilityPack.HtmlWeb hw = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load("https://www.ynet.co.il/home/0,7340,L-8,00.html");
List<string> htmls = new List<string>();
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//img"))
{
string hrefValue = link.GetAttributeValue("src", string.Empty);
htmls.Add(hrefValue);
}
foreach(var item in htmls){
Console.WriteLine(item);
}
if(doc.DocumentNode.SelectNodes("//a[@href]")==null){
Console.WriteLine("no links");
}
}
}
这可以在https://dotnetfiddle.net/QAZnDz上找到,也可以使用linq来过滤图像等。
答案 1 :(得分:0)
您可以尝试在下面使用
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string data = "<p>this is new document<img alt='' height='150' src='https://kuba2storage.blob.core.windows.net/kuba-appid-1/manual-1203/images/desert-20180824203530071.jpg' width='200'/>This is new document</p>";
var newdt = FetchImgsFromSource(data);
}
}
public static List<string> FetchImgsFromSource(string htmlSource)
{
List<string> listOfImgdata = new List<string>();
string regexImgSrc = @"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";
var matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in matchesImgSrc)
{
string href = m.Groups[1].Value;
listOfImgdata.Add(href);
}
return listOfImgdata;
}
}