我正在尝试抓取网页,但为了发布数据,我需要一个网络会话ID,如
web_session = HQJ3G1GPAAHRZGFR
我如何获得该ID?
到目前为止我的代码是:
Private Sub test()
Dim postData As String = "web_session=HQJ3G1GPAAHRZGFR&intext=O&term_code=201210&search_type=A&keyword=&kw_scope=all&kw_opt=all&subj_code=BIO&crse_numb=205&campus=*&instructor=*&instr_session=*&attr_type=*&mon=on&tue=on&wed=on&thu=on&fri=on&sat=on&sun=on&avail_flag=on" '/BANPROD/pkgyc_yccsweb.P_Results
Dim tempCookie As New CookieContainer
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
System.Net.ServicePointManager.SecurityProtocol = Net.SecurityProtocolType.Ssl3
Try
tempCookie.GetCookies(New Uri("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Results"))
'postData="web_session=" & tempCookie.
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Results"), HttpWebRequest)
postReq.Method = "POST"
postReq.KeepAlive = True
postReq.CookieContainer = tempCookie
postReq.ContentType = "application/x-www-form-urlencoded"
postReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; Media Center PC 4.0; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
postReq.ContentLength = byteData.Length
Dim postreqstream As Stream = postReq.GetRequestStream
postreqstream.Write(byteData, 0, byteData.Length)
postreqstream.Close()
Dim postresponse As HttpWebResponse
postresponse = DirectCast(postReq.GetResponse, HttpWebResponse)
tempCookie.Add(postresponse.Cookies)
Dim postresreader As New StreamReader(postresponse.GetResponseStream)
Dim thepage As String = postresreader.ReadToEnd
MsgBox(thepage)
Catch ex As WebException
MsgBox(ex.Status.ToString & vbNewLine & ex.Message.ToString)
End Try
End Sub
答案 0 :(得分:2)
问题是tempCookie.GetCookies()
没有做你认为它做的事情。实际上它实际上做的是将预先存在的CookieCollection
过滤为仅包含所提供URL的cookie。相反,您需要做的是首先创建一个页面请求,该页面将为您提供此会话令牌,然后对您的数据进行实际请求。因此,首先在P_Search
处请求该页面,然后重新使用该CookieContainer
绑定到该请求的请求并发布到P_Results
。
不是HttpWebRequest
对象,而是让我指向WebClient
班级和my post here about extending it to support cookies。你会发现你可以大大简化你的代码。下面是一个完整的VB2010 WinForms应用程序,显示了这一点。如果您仍想使用HttpWebRequest
对象,这至少应该让您了解需要做什么:
Option Strict On
Option Explicit On
Imports System.Net
Public Class Form1
Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
''//Create our webclient
Using WC As New CookieAwareWebClient()
''//Set SSLv3
System.Net.ServicePointManager.SecurityProtocol = Net.SecurityProtocolType.Ssl3
''//Create a session, ignore what is returned
WC.DownloadString("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Search")
''//POST our actual data and get the results
Dim S = WC.UploadString("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Results", "POST", "term_code=201130&search_type=K&keyword=math")
Trace.WriteLine(S)
End Using
End Sub
End Class
Public Class CookieAwareWebClient
Inherits WebClient
Private cc As New CookieContainer()
Private lastPage As String
Protected Overrides Function GetWebRequest(ByVal address As System.Uri) As System.Net.WebRequest
Dim R = MyBase.GetWebRequest(address)
If TypeOf R Is HttpWebRequest Then
With DirectCast(R, HttpWebRequest)
.CookieContainer = cc
If Not lastPage Is Nothing Then
.Referer = lastPage
End If
End With
End If
lastPage = address.ToString()
Return R
End Function
End Class