我有一点腌渍。我想在网站上抓取一系列图片。我知道如何做到这一点,但我必须过滤出图像的位置。
例如我想要在id为“theseImages”的div标签中抓取图像,但在另一个div标签中还有另一组图像,其ID为“notTheseImages”。将每个标记循环到带有标签“img”的HtmlElementCollection中会忽略div,因为它还会从“notTheseImages”中获取图像。
有没有办法在检查这些图像在div标签中的位置时循环显示图像?
答案 0 :(得分:0)
这可以帮助您选择当前的HTML,也许可以用于未来的场景:)
protected HtmlElement[] GetElementsByParent(HtmlDocument document, HtmlElement baseElement = null, params string[] singleSelectors)
{
if (singleSelectors == null || singleSelectors.Length == 0)
{
throw new Exception("Please give at least 1 selector!");
}
IList<HtmlElement> result = new List<HtmlElement>();
bool last = singleSelectors.Length == 1;
string singleSelector = singleSelectors[0];
if (string.IsNullOrWhiteSpace(singleSelector) || string.IsNullOrWhiteSpace(singleSelector.Trim()))
{
return null;
}
singleSelector = singleSelector.Trim();
if (singleSelector.StartsWith("#"))
{
var item = document.GetElementById(singleSelector.Substring(1));
if (item == null)
{
return null;
}
if (last)
{
result.Add(item);
}
else
{
var results = GetElementsByParent(document, item, singleSelectors.Skip(1).ToArray());
if (results != null && results.Length > 0)
{
foreach (var res in results)
{
result.Add(res);
}
}
}
}
else if (singleSelector.StartsWith("."))
{
if (baseElement == null)
{
baseElement = document.Body;
}
foreach (HtmlElement child in baseElement.Children)
{
string cls;
if (!string.IsNullOrWhiteSpace((cls = child.GetAttribute("class"))))
{
if (cls.Split(' ').Contains(singleSelector.Substring(1)))
{
if (last)
{
result.Add(child);
}
else
{
var results = GetElementsByParent(document, child, singleSelectors.Skip(1).ToArray());
if (results != null && results.Length > 0)
{
foreach (var res in results)
{
result.Add(res);
}
}
}
}
}
}
}
else
{
HtmlElementCollection elements = null;
if (baseElement != null)
{
elements = baseElement.GetElementsByTagName(singleSelector);
}
else
{
elements = document.GetElementsByTagName(singleSelector);
}
foreach (HtmlElement item in elements)
{
if (last)
{
result.Add(item);
}
else
{
var results = GetElementsByParent(document, item, singleSelectors.Skip(1).ToArray());
if (results != null && results.Length > 0)
{
foreach (var res in results)
{
result.Add(res);
}
}
}
}
}
return result.ToArray();
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// here we can query
var result = GetElementsByParent(webBrowser1.Document, null, "#theseImages", "img");
}
结果将包含#theseImages
下的图像请注意,GetElementsByParent是相当未经测试的,我只是根据您的用例测试它,它似乎没问题。
一旦确定文档已完成,请不要忘记启动查询;)