Question

我在完成的事件中使用webBrowser导航到一个网站：

void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            mshtml.HTMLDocument objHtmlDoc = (mshtml.HTMLDocument)webBrowser1.Document.DomDocument;
            string pageSource = objHtmlDoc.documentElement.innerHTML;
        }

现在在pageSource中我有整个页面源代码。我试图制作

string[] lines = File.ReadAllLines(pageSource);

但它给了我例外：

路径中的非法字符

然后我尝试了这一行：

var aContents = Regex.Matches(pageSource, @"<a [^>]*>(.*?)</a>").Cast<Match>().Select(m => m.Groups[1].Value);

但是我在aContents中没有href行

Answer 1

使用htmlagilitypack http://html-agility-pack.net

并且您可以使用库方法从url加载 - 然后检查节点以查看它是否包含ext并将其存储在集合中。

/// <summary>
        /// **Gets or sets a key/value collection that can be used to share data within the scope of this request.**
        /// </summary>
        public abstract IDictionary<object, object> Items { get; set; }

Answer 2

或只查询链接：

string[] hrefs = this.webBrowser1.Document.Links.Cast<HtmlElement>()
             .Select(a => a.GetAttribute("href")).Where(h => h.Contains(".jpg")).ToArray();

我怎样才能循环一个字符串并获得以jpg结尾的href之间的链接？

2 个答案: