我是第一次尝试使用HTML Agility Pack,我正在使用代码示例部分来解析HTML中的URL。但我收到一个错误,我不知道为什么我得到它。有人能指出我做错了吗?
这是源(html是HTML的传入字符串):
StringBuilder sb = new StringBuilder();
HtmlDocument htmldoc = new HtmlDocument();
htmldoc.LoadHtml(html);
foreach (HtmlNode link in htmldoc.DocumentNode.SelectNodes("//a[@HREF]"))
{
HtmlAttribute att = link.Attributes["HREF"];
sb.AppendLine(att.Value + "|");
}
return sb.ToString();
当我调试我的应用程序时,我收到以下错误(调试器将其放在“foreach”之后):
System.NullReferenceException was unhandled
Message=Object reference not set to an instance of an object.
Source=ScreenScraper
StackTrace:
at ScreenScraper.its.GetITSLoadID(String html) in C:\Web_Projects\ScreenScaper\ScreenScraper\its.cs:line 22
at ScreenScraper.frm1.btnStartScraping_Click(Object sender, EventArgs e) in C:\Web_Projects\ScreenScaper\ScreenScraper\frm1.cs:line 43
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr dwComponentID, Int32 reason, Int32 pvLoopData)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.Run(Form mainForm)
at ScreenScraper.Program.Main() in C:\Web_Projects\ScreenScaper\ScreenScraper\Program.cs:line 18
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.Runtime.Hosting.ManifestRunner.Run(Boolean checkAptModel)
at System.Runtime.Hosting.ManifestRunner.ExecuteAsAssembly()
at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext, String[] activationCustomData)
at System.Runtime.Hosting.ApplicationActivator.CreateInstance(ActivationContext activationContext)
at System.Activator.CreateInstance(ActivationContext activationContext)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssemblyDebugInZone()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
答案 0 :(得分:2)
Html Agility Pack有一个“设计错误”,它为空集合返回null。所以你需要这样做:
HtmlNodeList list = htmldoc.DocumentNode.SelectNodes("//a[@HREF]");
if (list != null)
{
foreach (HtmlNode link in list)
...
}
顺便说一句,XPATH表达式中指定的所有标记必须是小写的,即使它们在HTML文本中声明不同(因为HTML不区分大小写,默认的Html Agility Pack XPATH约定是使用小写标记)。所以你应该写这个:
HtmlNodeList list = htmldoc.DocumentNode.SelectNodes("//a[@href]");