但是我无法得到结果表,因为我在fiddler中看到了带有json结果的lazyloading方法。
我的代码是:
HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load(“http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true” );
// Get all tables in the document
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
// Iterate all rows in the first table
HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");
var data = rows.Skip(1).ToList().Take(10).ToList().Select(x => new TableRow()
{
Price = x.SelectNodes(".//td").ToList()[4].InnerText,
Operator = x.SelectNodes(".//td").ToList()[15].InnerText,
DepartureDate = x.SelectNodes(".//td").ToList()[6].InnerText,
DestinationRegion = x.SelectNodes(".//td").ToList()[7].InnerText
}).ToList();
更新的 第二站点: 代码
WebClient wc = new WebClient();
wc.Headers.Add("Referer", "http://sletat.ru/");//MUST BE THIS HEADER
string result = wc.DownloadString("http://module.sletat.ru/Main.svc/GetTours?cityFromId=832&countryId=35&cities=&meals=&stars=&hotels=&s_adults=1&s_kids=0&s_kids_ages=&s_nightsMin=6&s_nightsMax=16&s_priceMin=0&s_priceMax=¤cyAlias=RUB&s_departFrom=25%2F06%2F2012&s_departTo=31%2F07%2F2012&visibleOperators=&s_hotelIsNotInStop=true&s_hasTickets=true&s_ticketsIncluded=true&debug=0&filter=0&f_to_id=&requestId=19198631&pageSize=20&pageNumber=1&updateResult=1&includeDescriptions=1&includeOilTaxesAndVisa=1&userId=&jskey=1&callback=_jqjsp&_1340633427022=");
result = result.Substring(result.IndexOf("{"), result.LastIndexOf("}") - result.IndexOf("{") + 1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = json["GetToursResult"]["Data"]["aaData"] as object[];
// var operators = ((object[])json["result"]["prices"]).Cast<Dictionary<string, object>>();
var temp = prices.ToList().Take(20).Select(x => new TableRow
{
Operator = (x as object[])[40].ToString(),
//Price = x["operatorPrice"].ToString(),
//DepartureDate = x["checkinDate"].ToString(),
//DestinationRegion = ((Dictionary<string, object>)x["country"])["englishName"].ToString()
}).ToList();
string str = "";
foreach (var tableRow in temp)
{
str += tableRow.Operator + "<br />";
}
Response.Write(str);
通过这种方式我尝试所有工作正常但问题是这个链接工作大约30分钟,然后我需要再次放入其他链接。有什么方法可以解决这个问题吗?(只有第二个网站有它) 再次谢谢,
答案 0 :(得分:0)
数据真的来自这里:
可以动态调整page=#
和pageSize=#
。
因此,您只需从URL获取JSON数据并解析它,而不是解析HTML。例如:
WebClient wc = new WebClient();
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
CheckinDate = p["checkinDate"]
};
Console.WriteLine(data);
在我的LinqPad程序中,生成如下内容:
OperatorID Price Country CheckinDate
0 1,27 Greece 2012-06-28
0 55,90 Greece 2012-06-28
0 67,34 Greece 2012-06-28
还有更多行,具体取决于您要求的数量......
注意:result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
行的原因是jsonp结果在开头有这个垃圾:
jQuery17207647891761735082_1340131755603({"
使用})
结束,这会使JavascriptSerializer在尝试解析它时窒息;因此需要将其删除。
<强>更新强>
有趣的是,返回数据的ASHX处理程序似乎在请求中需要Referer
标头;否则,响应将不包括运营商信息。所需的Referer不能是你想要的任何东西,它似乎实际上正在寻找http://agent.bronni.ru
。
基本上,您需要做的就是:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");//MUST BE THIS HEADER
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json = js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices
select new
{
OperatorID = p["operatorID"],
Price = p["operatorPrice"],
Country = ((Dictionary<string,object>)p["country"])["englishName"],
Hotel = ((Dictionary<string,object>)p["hotel"])["englishName"],
Operator = ((Dictionary<string,object>)p["operator"])["englishName"],//OPERATOR
CheckinDate = p["checkinDate"]
};
OperatorID Price Country Hotel Operator CheckinDate
19681 1,27 Greece Julia Hotel Mouzenidis Travel 2012-06-28
19681 1,27 Greece Forest Park Mouzenidis Travel 2012-06-28
19681 1,27 Greece Kassandra Mare (ï-îâ Êàññàíäðà) Mouzenidis Travel 2012-06-28
更新2:
我决定比较开箱即用的Javascriptserializer与JSON.NET serializer的性能,并且在我的所有测试中使用不同的记录大小(50,1000,3000),JSON.NET至少快两倍Javascriptserializer,在某些情况下甚至比较小的记录集快10倍。
如果您决定使用JSON.NET库,这里的代码将为您提供与上述代码相同的结果:
WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=50&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JObject o = JObject.Parse(result);
var data = from x in o["result"]["prices"]
select new
{
OperatorID = x["operatorID"],
Price = x["operatorPrice"],
Country = x["country"]["englishName"],
Hotel = x["hotel"]["englishName"],
Operator = x["operator"]["englishName"],
CheckinDate = x["checkinDate"]
};
Console.WriteLine(data);