我试图从中提取单个数据
http://www.dsebd.org/displayCompany.php?name=NBL
我在附图中显示了所需的字段
Xpath:/ html / body / table [2] / tbody / tr / td [2] / table / tbody / tr [3] / td 1 / p 1 / table 1 / tbody / tr / td 1 / table / tbody / tr [2] / td [2] / font
错误:发生异常,并且未使用该Xpath找到数据。 "未处理的类型' System.Net.WebException'发生在HtmlAgilityPack.dll"
源代码:
static void Main(string[] args)
{
/************************************************************************/
string tickerid = "Bse_Prc_tick";
HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load(@"http://www.dsebd.org/displayCompany.php?name=NBL", "GET");
if (doc != null)
{
// Fetch the stock price from the Web page
string stockprice = doc.DocumentNode.SelectSingleNode(string.Format("./html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td1/p1/table1/tbody/tr/td1/table/tbody/tr[2]/td[2]/font", tickerid)).InnerText;
Console.WriteLine(stockprice);
}
Console.WriteLine("ReadKey Starts........");
Console.ReadKey();
}
答案 0 :(得分:2)
好吧,我查了一下。 XPath我们使用的只是不正确。当你试图找出错误的位置时,真正的乐趣就开始了。
只需查看您正在使用的页面的源代码,除了阻碍XPath的众多错误之外,它甚至包含多个HTML标记...
Chrome开发工具和您正在使用的工具,适用于由浏览器更正的dom树(全部打包到单个html节点,添加了一些tbody等)。
由于html结构被破坏,因此成为HtmlAgilityPack解析。
根据情况,您既可以使用RegExp,也可以只搜索源中的已知元素(速度更快,但灵活性更低)。
例如:
...
using System.Net; //required for Webclient
...
class Program
{
//entry point of console app
static void Main(string[] args)
{
// url to download
// "var" means I am too lazy to write "string" and let compiler decide typing
var url = @"http://www.dsebd.org/displayCompany.php?name=NBL";
// creating object in using makes Garbage Collector delete it when using block ends, as opposed to standard cleaning after whole function ends
using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
{
// simply download result to string, in this case it will be html code
string htmlCode = client.DownloadString(url);
// cut html in half op position of "Last Trade:"
// searching from beginning of string is easier/faster than searching in middle
htmlCode = htmlCode.Substring(
htmlCode.IndexOf("Last Trade:")
);
// select from .. to .. and then remove leading and trailing whitespace characters
htmlCode = htmlCode.Substring("2\">", "</font></td>").Trim();
Console.WriteLine(htmlCode);
}
Console.ReadLine();
}
}
// http://stackoverflow.com/a/17253735/3147740 <- copied from here
// this is Extension Class which adds overloaded Substring() I used in this code, it does what its comments says
public static class StringExtensions
{
/// <summary>
/// takes a substring between two anchor strings (or the end of the string if that anchor is null)
/// </summary>
/// <param name="this">a string</param>
/// <param name="from">an optional string to search after</param>
/// <param name="until">an optional string to search before</param>
/// <param name="comparison">an optional comparison for the search</param>
/// <returns>a substring based on the search</returns>
public static string Substring(this string @this, string from = null, string until = null, StringComparison comparison = StringComparison.InvariantCulture)
{
var fromLength = (from ?? string.Empty).Length;
var startIndex = !string.IsNullOrEmpty(from)
? @this.IndexOf(from, comparison) + fromLength
: 0;
if (startIndex < fromLength) { throw new ArgumentException("from: Failed to find an instance of the first anchor"); }
var endIndex = !string.IsNullOrEmpty(until)
? @this.IndexOf(until, startIndex, comparison)
: @this.Length;
if (endIndex < 0) { throw new ArgumentException("until: Failed to find an instance of the last anchor"); }
var subString = @this.Substring(startIndex, endIndex - startIndex);
return subString;
}
}
答案 1 :(得分:0)
将代码包装在try-catch中以获取有关异常的更多信息。