我正在尝试修改生产临床医师软件套件中的vb.net 2005。在州保险网站最近更新之前,该程序成功使用屏幕抓取使用用户登录信息登录网站,并使用HTTPWebRequest上传,下载等网站。大部分工作都是使用HTTPWebRequest和HTTPWebResponse完成的。下载需要使用SOAP,但所有这些都在我的工作之前工作了好几年。
上周国家网站发生了重大变化,州政府机构并没有真正与我合作,所以我独自一人。当我查看源代码时,这是在页面正文中。
<form method="post" action="/hcp/Default.aspx?alias=www.ohcaprovider.com/hcp/provider" onsubmit="javascript:return WebForm_OnSubmit();" id="Form" enctype="multipart/form-data" autocomplete="off">
我注意到的第一个区别是第一页正在对我自己做PostBack,我们习惯在下一页Url的末尾发布参数。
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
此外,第一页只需要一个登录名,无论你手动输入什么都将进入下一页(因为这是一个回帖我假设它也有重定向)。但是,如果我设置HTTPWebRequest,它总是给我一个状态200,响应是门户网站的默认页面(也是第一页)。
我真的在研究和寻找答案。我是发布到论坛的新人,我非常欢迎并需要一些帮助。
我已经安装了firebug,并注意到当我手动发布时它会在帖子中显示我似乎是多部分/表单数据的设计。我试图复制并放置HTTPWebRequest,但它没有给我任何东西,只有状态200,响应是默认页面。下面我将尝试将代码拼凑在一起,因为它在不同的OOP部分中。
基本上我设置了Httpwebrequest,添加标题,获取页面,抓取__ViewState,设置multipart / form,设置post httpwebrequest,post然后我没有得到我期望的结果。我不确定会发生什么,或者此代码中的一个(或多个)部分是否正常工作。再次感谢您的帮助。
Dim lsViewState As String = "__VIEWSTATE"" value="""
Try
'Section of code to get the upload form GET
chwrequest = WebRequest.Create("https://www.ohcaprovider.com/hcp/Default.aspx?alias=www.ohcaprovider.com/hcp/provider")
chwrRequest.Method = "GET"
chwrRequest.KeepAlive = True
chwrRequest.CookieContainer = cckcCookieContainer
' Configure the web request to work with a proxy, like ACT
If pobjProxy Is Nothing Then
pobjProxy = System.Net.WebRequest.DefaultWebProxy
pobjProxy.Credentials = System.Net.CredentialCache.DefaultCredentials
End If
chwrRequest.Proxy = pobjProxy
'ADD Headers
chwrRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0"
chwrRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
chwrRequest.Headers.Add("Accept-Language", "en")
chwrRequest.Headers.Add("Accept-Charset", "windows-1252, utf-8, utf-16, iso-8859-1;q=0.6, *;q=0.1")
chwrRequest.KeepAlive = True
'Get Page
chrsResponse = chwrRequest.GetResponse()
cstmStream = chrsResponse.GetResponseStream()
lsResp = CSubmitterUtils.GetStreamContent(cstmStream)
cstmStream.Close()
chrsResponse.Close()
CSubmitterUtils.WriteFileContent(psSaveAs, lsResp) **writes to file for debug purposes
'Store cookie Date
fsCookieData = cckcCookieContainer.GetCookieHeader(New Uri(OHCA_WEB_NEW))
'Section of code to do fill form and upload file SCRAPE for viewSTATE
Dim lnViewStateURLIndex As Integer = csResp.IndexOf(lsViewState)
If lnViewStateURLIndex < 0 Then
WriteLog("ViewState not found")
lbReturn = False
End If
Dim lnStartIndex As Integer = lnViewStateURLIndex + lsViewState.Length
Dim lnEqualIndex As Integer = csResp.IndexOf("=", lnStartIndex)
Dim lsViewStateContents As String = csResp.Substring(lnStartIndex, lnEqualIndex - lnStartIndex)
'Setup to POST
chwrequest = WebRequest.Create(psUrl)
chwrRequest.Method = "POST"
chwrRequest.KeepAlive = True
chwrRequest.CookieContainer = cckcCookieContainer
' Configure the web request to work with a proxy, like ACT
If pobjProxy Is Nothing Then
pobjProxy = System.Net.WebRequest.DefaultWebProxy
pobjProxy.Credentials = System.Net.CredentialCache.DefaultCredentials
End If
chwrRequest.Proxy = pobjProxy
'ADD Headers
chwrRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0"
chwrRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
chwrRequest.Headers.Add("Accept-Language", "en")
chwrRequest.Headers.Add("Accept-Charset", "windows-1252, utf-8, utf-16, iso-8859-1;q=0.6, *;q=0.1")
chwrRequest.KeepAlive = True
chwrRequest.AllowAutoRedirect = False
'Setup multipart/form
SetupLogonFileSubmit(lsViewStateContents)
Dim lmpBuffer As MultiPartBuffer
Dim lsContentType As String = "Content-Disposition: form-data; name="
dim csBoundary = "------------------------------" & DateTime.Now.Ticks.ToString("x")
lmpBuffer.ContentTypeHeader = "multipart/form-data; boundary=" & csBoundary.Substring(2)
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "__EVENTTARGET")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine("")
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "__EVENTARGUMENT")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine("")
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "__LASTFOCUS")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine("")
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "__VIEWSTATE")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine(lsViewStateContents + "=")
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "__VIEWSTATEENCRYPTED")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine("")
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "dnn$ctr1842$Login$UserIdCmnTextBox$Control")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine(psLogName)
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "dnn$ctr1842$Login$LoginCmnButton")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine("Log In")
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "ScrollTop")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine("")
lmpBuffer.WriteLine(csBoundary)
lmpBuffer.WriteLine(lsContentType + "__dnnVariable")
lmpBuffer.WriteLine()
lmpBuffer.WriteLine("{""__scdoff"":""1""}")
lmpBuffer.CloseBuffer()
Dim lsMpContent As String = lmpBuffer.ToString()
chwrRequest.ContentLength = lsMpContent.Length
chwrRequest.ContentType = lmpBuffer.HttpContentTypeHeader
Dim lbyBytesBuff As Byte()
lbyBytesBuff = Encoding.UTF8.GetBytes(lsMpContent)
cstmStream = chwrRequest.GetRequestStream()
cstmStream.Write(lbyBytesBuff, 0, lbyBytesBuff.Length)
cstmStream.Close()
'Get the Response
chrsResponse = chwrRequest.GetResponse()
'Put it in a stream
cstmStream = chrsResponse.GetResponseStream()
If chrsResponse.StatusCode = HttpStatusCode.OK Or chrsResponse.StatusCode = HttpStatusCode.Found Then
lsResp = CSubmitterUtils.GetStreamContent(cstmStream)
cstmStream.Close()
Else
lsResp = ""
End If
chrsResponse.Close()
CSubmitterUtils.WriteFileContent(psSaveAs, lsResp) **Previously this was then used to move on to the next page for scraping/posting
答案 0 :(得分:0)
我能够使用firebug来帮助我修复多部分表单。其中一个问题是我需要在名字旁边加上引号。我们还在帖子的设置中添加了一个cookie容器。之后为了继续浏览页面,我们获得了参数并构建了多部分表单。我们已经到了最后一页。我将把这个问题作为一个新问题发布。 谢谢大家的帮助。
答案 1 :(得分:0)
这是大多数解决方案。如果您仍有疑问,我会尽力帮助您。
如果Not IsNothing(chcjCookieJar)那么 chcjCookieJar =新的CookieContainer 结束如果
在您第一次GET之前将其放到网站
我描述了我用于示例的所有代码,因此我需要花费数小时才能为解决方案做同样的事情。
我们有一个名为MulitPartBuffer的课程
基本上,在我们获取第一页之后,我们将Multipart表单设置为发布
lmpBuffer = ConstructLogonFileBuffer(.LoginLogAction)
如果Not IsNothing(lmpBuffer)那么 lsMpContent = lmpBuffer.ToString() chwrRequest.ContentType = lmpBuffer.HttpContentTypeHeader
lbyBytesBuff = Encoding.UTF8.GetBytes(lsMpContent)
chwrRequest.ContentLength = lbyBytesBuff.Length
cstmStream = chwrRequest.GetRequestStream()
cstmStream.Write(lbyBytesBuff, 0, lbyBytesBuff.Length)
cstmStream.Close()
Else
WriteLog("Error writing to buffer.")
End If
然后在设置句柄后我们提交
'Get the Response
chrsResponse = chwrRequest.GetResponse()
'Put it in a stream
cstmStream = chrsResponse.GetResponseStream()
'Write to Log, displayed on screen
WriteLog("ResponseCode: " + chrsResponse.StatusCode.ToString())
If chrsResponse.StatusCode = HttpStatusCode.OK Then
lsResp = CSubmitterUtils.GetStreamContent(cstmStream)
cstmStream.Close()
Else
lsResp = ""
End If
chrsResponse.Close()
我们保存结果
CSubmitterUtils.WriteFileContent(psSaveAs, lsResp)
我们如何构造多部分表单(.Variables(psContentType)是我们通过查看POST结果从firebug中获取的值) 如果有,它还需要一个值(psFileContent),我们要么从cookie,屏幕抓取或我们的用户信息中获取)
lmpBuffer.AddFilePartGeneric(.EventTarget)
lmpBuffer.AddFilePartGeneric(.EventArgument)
lmpBuffer.AddFilePartGeneric(.LastFocus)
lmpBuffer.AddFilePartGeneric(.ViewStatePost, csViewStateContents)
lmpBuffer.AddFilePartGeneric(.ViewStateEncrypted)
lmpBuffer.AddFilePartGeneric(.LoginControl, TheLogin)
lmpBuffer.AddFilePartGeneric(.LoginLogAction, psLogAction)
lmpBuffer.AddFilePartGeneric(.ScrollTop)
lmpBuffer.AddFilePartGeneric(.dnnVariablePost)
lmpBuffer.CloseBuffer()
Public Sub AddFilePartGeneric(ByVal psContentType As String, Optional ByVal psFileContent As String = "")
AddFilePart(Content-Disposition: form-data; name=, "", psContentType, psFileContent, Nothing)
End Sub
注意:此子标签没有正确标注(我们重复使用代码,因此参数名称不是非常友好)
Public Sub AddFilePart(ByVal psFieldName As String, ByVal psFileName As String, ByVal contentType As String, ByVal fileContent As String, ByVal contentTransferEncoding As String)
Try
WriteLine("------------------------------" & DateTime.Now.Ticks.ToString("x"))
If Not contentType Is Nothing Then
WriteLine(psFieldName + contentType)
End If
WriteLine()
WriteLine(fileContent)
Catch ex As Exception
Cache.WriteException(ex.ToString)
End Try
End Sub
当您单步执行时,您应该能够查看lsMpContent并且它应该与FIREBUG在多个用户表单的帖子中完全匹配)
使用cookiejar可以为后续页面重复此过程,您只需正确设置多部分表单并将任何参数添加到句柄中。
请注意:自此以后已经有很多天了。我可能没有确切的firebug术语。希望这会有所帮助。
答案 2 :(得分:-1)
.NET站点往往更难以清除。此博客文章可能有所帮助:
http://blog.screen-scraper.com/2008/06/04/scraping-aspnet-sites/
如果您最终在项目中碰壁,请随时直接通过screen-scraper.com与我们联系。