我正在尝试浏览网页源代码,将<img src="http://www.dot.com/image.jpg"
添加到HtmlElementCollection
。然后我尝试使用foreach循环遍历元素集合中的每个元素,并通过URL下载图像。
这是我到目前为止所拥有的。我现在的问题是没有下载,我认为我的元素没有被标记名称正确添加。如果他们是我似乎无法参考他们下载。
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public void button1_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
string sourceCode = WorkerClass.ScreenScrape(url);
StreamWriter sw = new StreamWriter("sourceScraped.html");
sw.Write(sourceCode);
}
private void button2_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
WebBrowser browser = new WebBrowser();
browser.Navigate(url);
HtmlElementCollection collection;
List<HtmlElement> imgListString = new List<HtmlElement>();
if (browser != null)
{
if (browser.Document != null)
{
collection = browser.Document.GetElementsByTagName("img");
if (collection != null)
{
foreach (HtmlElement element in collection)
{
WebClient wClient = new WebClient();
string urlDownload = element.FirstChild.GetAttribute("src");
wClient.DownloadFile(urlDownload, urlDownload.Substring(urlDownload.LastIndexOf('/')));
}
}
}
}
}
}
}
答案 0 :(得分:2)
如果您调用导航,则假设文档已准备好遍历并检查图像。但实际上需要一些时间来加载。您需要等到文档加载完成。
将事件DocumentCompleted
添加到浏览器对象
browser.DocumentCompleted += browser_DocumentCompleted;
将其实现为
static void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = (WebBrowser)sender;
HtmlElementCollection collection;
List<HtmlElement> imgListString = new List<HtmlElement>();
if (browser != null)
{
if (browser.Document != null)
{
collection = browser.Document.GetElementsByTagName("img");
if (collection != null)
{
foreach (HtmlElement element in collection)
{
WebClient wClient = new WebClient();
string urlDownload = element.GetAttribute("src");
wClient.DownloadFile(urlDownload, urlDownload.Substring(urlDownload.LastIndexOf('/')));
}
}
}
}
}
答案 1 :(得分:0)
查看 Html Agility Pack 。
您需要做的是下载并解析HTML,然后处理您感兴趣的元素。这是执行此类任务的好工具。
答案 2 :(得分:0)
对任何有兴趣的人来说,这是解决方案。这正是Damith所说的。我发现Html Agility Pack相当破碎。这是我尝试使用的第一件事。这最终成为一个更可行的解决方案,这是我的最终代码。
private void button2_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
WebBrowser browser = new WebBrowser();
browser.Navigate(url);
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(DownloadFiles);
}
private void DownloadFiles(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection collection;
List<HtmlElement> imgListString = new List<HtmlElement>();
if (browser != null)
{
if (browser.Document != null)
{
collection = browser.Document.GetElementsByTagName("img");
if (collection != null)
{
foreach (HtmlElement element in collection)
{
string urlDownload = element.GetAttribute("src");
if (urlDownload != null && urlDownload.Length != 0)
{
WebClient wClient = new WebClient();
wClient.DownloadFile(urlDownload, "C:\\users\\folder\\location\\" + urlDownload.Substring(urlDownload.LastIndexOf('/')));
}
}
}
}
}
}
}
}