当抓取某些类的节点时,htmlagilitypack无效的强制转换异常

时间:2014-04-21 16:58:39

标签: exception dom html-agility-pack

我想从包含某些类类型的HTML页面中抓取div个项目。 我正在使用这个:

HtmlNode authorNode =(HtmlNode) doc.DocumentNode.Descendants("div").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Split(' ').Any(b => b.Equals("byline") && b.Equals("list-pipes")));

我得到了这个例外

System.InvalidCastException was unhandled
  HResult=-2147467262
  Message=Unable to cast object of type 'WhereEnumerableIterator`1[HtmlAgilityPack.HtmlNode]' to type 'HtmlAgilityPack.HtmlNode'.
  Source=Project1
  StackTrace:
       at Project1.Scraper.processBI_Article(String uri) in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\Scraper.cs:line 233
       at Project1.Scraper.processNode(String uri, HtmlNode parentNode) in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\Scraper.cs:line 194
       at Project1.Scraper.ExecuteScraping() in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\Scraper.cs:line 107
       at Project1.WebscrapingMain.Main() in C:\Users\jgarber\Documents\Visual Studio 2010\Projects\Project1\Project1\WebscrapingMain.cs:line 64
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: 

我很困惑我现在需要做什么才能让它发挥作用。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

代码返回一个HtmlNode的Enumerable,而不只是一个节点(你正在使用.Where,它返回一个匹配的所有项目)。如果您只对第一项感兴趣,请使用FirstFirstOrDefault,或者如果您知道只有一项,请使用SingleSingleOrDefault而非Where {1}}。

所以:

 HtmlNode authorNode = doc.DocumentNode.Descendants("div")
                           .Where(d => d.Attributes.Contains("class") 
                                       && d.Attributes["class"].Value.Split(' ')
                                       .Any(b => b.Equals("byline") && b.Equals("list-pipes")))
                           .FirstOrDefault();

或者将其分配给可枚举的HtmlNode:

 IEnumerable<HtmlNode> authorNodes = doc.DocumentNode.Descendants("div")
                           .Where(d => d.Attributes.Contains("class") 
                                       && d.Attributes["class"].Value.Split(' ')
                                       .Any(b => b.Equals("byline") && b.Equals("list-pipes")));