从Html Img标签中回复Url

时间:2016-07-14 14:11:30

标签: c# string html-agility-pack

BackGround信息

目前正在开发一个C#web api,它将返回选定的Img url'作为base64。我目前有能够执行base64转换的功能,但是我收到了大量的文本,其中还包括Img Url,我将需要从字符串中裁剪出来并将其交给我的函数来转换img基于64.我读了一个lib。(" HtmlAgilityPack;")应该让这个任务变得容易,但是当我使用它时,我得到了" HtmlDocument.cs"未找到。但是,我没有提交文档,而是发送一个HTML字符串。我阅读了文档,并且假设也使用字符串,但它对我不起作用。这是使用" HtmlAgilityPack"。

的代码

非工作代码

foreach(var item in returnList)
                    {
                         if (item.Content.Contains("~~/picture~~"))
                        {
                            HtmlDocument doc = new HtmlDocument();
                            doc.Load(item.Content);
来自HtmlAgilityPack的

错误消息

enter image description here

问题 我收到一个来自SharePoint的Html字符串。该Html字符串可以用标题标记和/或图片标记来标记化。我试图隔离从img src Hmtl标签检索html。我理解正则表达式可能是不切实际的,但我会考虑使用正则表达式,可以从img src中检索url。

示例字符串

Bullet~~Increased Cash Flow</li><li>~~/Document Text Bullet~~Tax Efficient Organizational Structures</li><li>~~/Document Text Bullet~~Tax Strategies that Closely Align with Business Strategies</li><li>~~/Document Text Bullet~~Complete Knowledge of State and Local Tax Obligations</li></ul><p>~~/Document Heading 2~~is the firm of choice</p><p>~~/Document Text~~When it comes to accounting and advisory services is the unique firm of choice. As a trusted advisor to our clients, we bring an integrated client service approach with dedicated industry experience. Dixon Hughes Goodman respects the value of every client relationship and provides clients throughout the U.S. with an unwavering commitment to hands-on, personal attention from our partners and senior-level professionals.</p><p>~~/Document Text~~of choice for clients in search of a trusted advisor to deal with their state and local tax needs. Through our leading best practices and experience, our SALT professionals offer quality and ease to the client engagement. We are proud to provide highly comprehensive services.</p>

    <p>~~/picture~~<br></p><p> 
          <img src="/sites/ContentCenter/Graphics/map-al.jpg" alt="map al" style="width&#58;611px;height&#58;262px;" />&#160;
    <br></p><p><br></p><p>
    ~~/picture~~<br></p><p>
          <img src="/sites/ContentCenter/Graphics/Firm_Telescope_Illustration.jpg" alt="Firm_Telescope_Illustration.jpg" style="margin&#58;5px;width&#58;155px;height&#58;155px;" />    </p><p></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div>

重要

我正在使用HTML字符串,而不是文件。

2 个答案:

答案 0 :(得分:0)

string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].+?>", RegexOptions.IgnoreCase).Groups[1].Value;

已多次询问here

here

答案 1 :(得分:0)

你遇到的问题是C#正在寻找一个文件,因为它找不到它,它会告诉你。这不是一个会扼杀你的应用程序的错误,它只是告诉你找不到文件而Lib会读取给定的字符串。可在此处找到此文档https://htmlagilitypack.codeplex.com/SourceControl/latest#Trunk/HtmlAgilityPackDocumentation.shfbproj。下面的代码是任何人都可以使用的cookie切割器模型。

重要

C#正在寻找一个无法显示的文件,因为它是一个提供的字符串。这是您收到的消息,但是您仍然可以根据提供的文档工作,并且不会影响您的代码。

代码

HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml("YourContent"); // can be a string or can be a path.

HtmlAttribute att = url.Attributes["src"];
Uri imgUrl = new System.Uri("Url"+ att.Value); // build your url