我想从包含某些类类型的HTML页面中抓取div
个项目。
我正在使用这个:
HtmlNode authorNode =(HtmlNode) doc.DocumentNode.Descendants("div").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Split(' ').Any(b => b.Equals("byline") && b.Equals("list-pipes")));
我得到了这个例外
System.InvalidCastException was unhandled
HResult=-2147467262
Message=Unable to cast object of type 'WhereEnumerableIterator`1[HtmlAgilityPack.HtmlNode]' to type 'HtmlAgilityPack.HtmlNode'.
Source=Project1
StackTrace:
at Project1.Scraper.processBI_Article(String uri) in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\Scraper.cs:line 233
at Project1.Scraper.processNode(String uri, HtmlNode parentNode) in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\Scraper.cs:line 194
at Project1.Scraper.ExecuteScraping() in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\Scraper.cs:line 107
at Project1.WebscrapingMain.Main() in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\WebscrapingMain.cs:line 64
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
我很困惑我现在需要做什么才能让它发挥作用。任何帮助将不胜感激。
答案 0 :(得分:1)
代码返回一个HtmlNode
的Enumerable,而不只是一个节点(你正在使用.Where,它返回一个匹配的所有项目)。如果您只对第一项感兴趣,请使用First
或FirstOrDefault
,或者如果您知道只有一项,请使用Single
或SingleOrDefault
而非Where
{1}}。
所以:
HtmlNode authorNode = doc.DocumentNode.Descendants("div")
.Where(d => d.Attributes.Contains("class")
&& d.Attributes["class"].Value.Split(' ')
.Any(b => b.Equals("byline") && b.Equals("list-pipes")))
.FirstOrDefault();
或者将其分配给可枚举的HtmlNode:
IEnumerable<HtmlNode> authorNodes = doc.DocumentNode.Descendants("div")
.Where(d => d.Attributes.Contains("class")
&& d.Attributes["class"].Value.Split(' ')
.Any(b => b.Equals("byline") && b.Equals("list-pipes")));