我在网站上有以下内容,我正试图抓取
<td align="right">
<span id="ctl00_ContentPlaceHolder1_lblCount1">825 records found, </span>
Page
<input name="ctl00$ContentPlaceHolder1$txtCurrent1" type="text" value="1" maxlength="4" id="ctl00_ContentPlaceHolder1_txtCurrent1" style="width:30px;" />
of
<span id="ctl00_ContentPlaceHolder1_lblTotalPage1">83</span>
<input type="submit" name="ctl00$ContentPlaceHolder1$btnGo1" value="GO" id="ctl00_ContentPlaceHolder1_btnGo1" class="inputbtn" />
</td>
我使用rvest包
尝试了以下代码pgsession <- html_session(url)
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform,`ctl00$ContentPlaceHolder1$txtCurrent1` = 2)
result <- submit_form(pgsession,filled_form)
我没有把网站上的下一张表归还给我。我如何使用此包提交值并返回生成的HTML?我做了一些探讨,也许我应该使用R
包httr
和rcurl
来做这件事。
答案 0 :(得分:3)
我明白了。正确的代码是:
pgsession <- html_session("url")
pgform <- html_form(read_html(pgsession))[[1]]
filled_form <- set_values(pgform, `ctl00$ContentPlaceHolder1$txtCurrent1` =2)
result <- submit_form(pgsession,filled_form, submit='ctl00$ContentPlaceHolder1$btnGo1')
case_home <- read_html(result)