我正在使用R,版本3.3.1。我正在尝试从以下网站废弃数据:
http://plovila.pomorstvo.hr/
如您所见,它是一种HTML表单。我想选择"Tip objekta"
(对象类型),例如“Jahta”(Yacht)并输入"NIB"
(这是一个整数,例如93567)。你可以试试自己;只需选择“Jahta”并在NIB字段中键入93567。
方法为POST
,类型为application/x-www-form-urlencoded
。我尝试了3种不同的方法:使用rvest,POST(httr包)和postForm(Rcurl)。我的rvest代码是:
session <- html_session("http://plovila.pomorstvo.hr")
form <- html_form(session)[[1]]
form <- set_values(form, `ctl00$Content_FormContent$uiTipObjektaDropDown` = 2,
`ctl00$Content_FormContent$uiOznakaTextBox` = "",
`ctl00$Content_FormContent$uiNibTextBox` = 93567)
x <- submit_form(session, form)
如果我运行此代码并获得200状态,但我不明白如何获取该表:
其他步骤是提交Detalji
按钮并获取其他信息,但我无法从x提交输出中看到任何信息。
答案 0 :(得分:3)
我使用curlconverter
包将&#34;复制为cURL&#34;来自XHR POST
请求的数据并将其自动转换为:
httr::VERB(verb = "POST", url = "http://plovila.pomorstvo.hr/",
httr::add_headers(Origin = "http://plovila.pomorstvo.hr",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`X-Requested-With` = "XMLHttpRequest",
Connection = "keep-alive",
`X-MicrosoftAjax` = "Delta=true",
Pragma = "no-cache", `User-Agent` = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.34 Safari/537.36",
Accept = "*/*", `Cache-Control` = "no-cache",
Referer = "http://plovila.pomorstvo.hr/",
DNT = "1"), httr::set_cookies(ASP.NET_SessionId = "b4b123vyqxnt4ygzcykwwvwr"),
body = list(`ctl00$uiScriptManager` = "ctl00$Content_FormContent$ctl00|ctl00$Content_FormContent$uiPretraziButton",
ctl00_uiStyleSheetManager_TSSM = ";|635908784800000000:d29ba49:3cef4978:9768dbb9",
`ctl00$Content_FormContent$uiTipObjektaDropDown` = "2",
`ctl00$Content_FormContent$uiImeTextBox` = "",
`ctl00$Content_FormContent$uiNibTextBox` = "93567",
`__EVENTTARGET` = "", `__EVENTARGUMENT` = "",
`__LASTFOCUS` = "", `__VIEWSTATE` = "/wEPDwUKMTY2OTIzNTI1MA9kFgJmD2QWAgIDD2QWAgIBD2QWAgICD2QWAgIDD2QWAmYPZBYIAgEPZBYCZg9kFgZmD2QWAgIBDxAPFgYeDURhdGFUZXh0RmllbGQFD05heml2VGlwT2JqZWt0YR4ORGF0YVZhbHVlRmllbGQFDElkVGlwT2JqZWt0YR4LXyFEYXRhQm91bmRnZBAVBAAHQnJvZGljYQVKYWh0YQbEjGFtYWMVBAEwATEBMgEzFCsDBGdnZ2cWAQICZAIBDw8WAh4HVmlzaWJsZWdkFgICAQ8PFgIfA2dkZAICDw8WAh8DaGQWAgIBDw8WBB4EVGV4dGUfA2hkZAIHDzwrAA4CABQrAAJkFwEFCFBhZ2VTaXplAgoBFgIWCw8CCBQrAAhkZGRkZDwrAAUBBAUHSWRVcGlzYTwrAAUBBAUISWRVbG9za2E8KwAFAQQFBlNlbGVjdGRlFCsAAAspelRlbGVyaWsuV2ViLlVJLkdyaWRDaGlsZExvYWRNb2RlLCBUZWxlcmlrLldlYi5VSSwgVmVyc2lvbj0yMDEzLjMuMTExNC40MCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj0xMjFmYWU3ODE2NWJhM2Q0ATwrAAcACyl1VGVsZXJpay5XZWIuVUkuR3JpZEVkaXRNb2RlLCBUZWxlcmlrLldlYi5VSSwgVmVyc2lvbj0yMDEzLjMuMTExNC40MCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj0xMjFmYWU3ODE2NWJhM2Q0ARYCHgRfZWZzZGQWBB4KRGF0YU1lbWJlcmUeBF9obG0LKwQBZGZkAgkPZBYCZg9kFgJmD2QWIAIBD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAIDD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAIFD2QWAgIDDzwrAAgAZAIHD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAIJD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAILD2QWBgIDDxQrAAI8KwAIAGRkAgUPFCsAAjwrAAgAZGQCBw8UKwACPCsACABkZAIND2QWBgIDDxQrAAI8KwAIAGRkAgUPFCsAAjwrAAgAZGQCBw8UKwACPCsACABkZAIPD2QWAgIDDxQrAAI8KwAIAGRkAhEPZBYGAgMPPCsACABkAgUPPCsACABkAgcPPCsACABkAhMPZBYGAgMPPCsACABkAgUPPCsACABkAgcPPCsACABkAhUPZBYCAgMPPCsACABkAhcPZBYGAgMPPCsACABkAgUPPCsACABkAgcPPCsACABkAhkPPCsADgIAFCsAAmQXAQUIUGFnZVNpemUCBQEWAhYLZGRlFCsAAAsrBAE8KwAHAAsrBQEWAh8FZGQWBB8GZR8HCysEAWRmZAIbDzwrAA4CABQrAAJkFwEFCFBhZ2VTaXplAgUBFgIWC2RkZRQrAAALKwQBPCsABwALKwUBFgIfBWRkFgQfBmUfBwsrBAFkZmQCHQ88KwAOAgAUKwACZBcBBQhQYWdlU2l6ZQIFARYCFgtkZGUUKwAACysEATwrAAcACysFARYCHwVkZBYEHwZlHwcLKwQBZGZkAiMPPCsADgIAFCsAAmQXAQUIUGFnZVNpemUCBQEWAhYLZGRlFCsAAAsrBAE8KwAHAAsrBQEWAh8FZGQWBB8GZR8HCysEAWRmZAILD2QWAmYPZBYCZg9kFgICAQ88KwAOAgAUKwACZBcBBQhQYWdlU2l6ZQIFARYCFgtkZGUUKwAACysEATwrAAcACysFARYCHwVkZBYEHwZlHwcLKwQBZGZkZIULy2JISPTzELAGqWDdBkCVyvvKIjo/wm/iG9PT1dlU",
`__VIEWSTATEGENERATOR` = "CA0B0334",
`__PREVIOUSPAGE` = "jGgYHmJ3-6da6PzGl9Py8IDr-Zzb75YxIFpHMz4WQ6iQEyTbjWaujGRHZU-1fqkJcMyvpGRkWGStWuj7Uf3NYv8Wi0KSCVwn435kijCN2fM1",
`__ASYNCPOST` = "true",
`ctl00$Content_FormContent$uiPretraziButton` = "Pretraži"),
encode = "form") -> res
您可以通过以下方式查看结果:
content(res, as="text") # returns raw HTML
或
content(res, as="parsed") # returns something you can use with `rvest` / `xml2`
不幸的是,这是另一个无用的SharePoint网站,&#34; eGov&#34;世界各地的网站已经成为一件好事。这意味着您必须进行反复试验以确定哪些参数是必要的,因为它几乎在每个站点上都有所不同。我尝试了一个最小的设置无济于事。
您甚至可能必须首先向主站点发出GET
请求以建立会话。
但这应该让你朝着正确的方向前进。