我有一个页面要解析(刮)但首先我必须通过控制措施 有一些控制数我设法计算,当我尝试通过POST方法参数传递这些控制数和其他东西时,我认为页面刷新本身并生成新的控制数,所以我计算的那些不通过检查我没有访问所需的页面。
首先,我使用HtmlAgilityPack方法获取页面并获取这些控制数的值:
HtmlWeb web = new HtmlWeb();
HtmlDocument mainPage = web.Load(url);
int controlNumber = FindControlNumber();
之后我尝试通过POST方法传递计算出的数字:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(newUrl);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
string data = @"id=" + id + "&controlNumber=" + controlNumber;
byte[] dataStream = Encoding.UTF8.GetBytes(data);
request.ContentLength = dataStream.Length;
Stream newStream = request.GetRequestStream();
newStream.Write(dataStream, 0, dataStream.Length);
newStream.Close();
HttpWebResponse webResponse = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(webResponse.GetResponseStream());
string html = sr.ReadToEnd();
但是,然后不是导航到所需页面,而是显示初始页面,其中显示“错误的控制编号”。
我做错了什么?
答案 0 :(得分:0)
真的,我唯一要做的就是设置cookie并使用WebRequest和WebResponse而不是HPA的HtmlWeb。
这是有效的代码:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.KeepAlive = true;
CookieContainer cookies = new CookieContainer(); // instantiate cookie container
request.CookieContainer = cookies;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
var stream = response.GetResponseStream();
// Calculate control number...
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(newUrl);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
string data = @"id=" + id + "&controlNumber=" + controlNumber;
byte[] dataStream = Encoding.UTF8.GetBytes(data);
request.ContentLength = dataStream.Length;
request.CookieContainer = cookies;
Stream newStream = request.GetRequestStream();
newStream.Write(dataStream, 0, dataStream.Length);
newStream.Close();
HttpWebResponse webResponse = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(webResponse.GetResponseStream());
string html = sr.ReadToEnd();