使用rvest submit_form获取下一个结果

时间:2016-03-16 03:11:46

标签: r web-scraping rcurl rvest httr

我在网站上有以下内容,我正试图抓取

    <td align="right">
                <span id="ctl00_ContentPlaceHolder1_lblCount1">825 records found, </span>
                Page
                <input name="ctl00$ContentPlaceHolder1$txtCurrent1" type="text" value="1" maxlength="4" id="ctl00_ContentPlaceHolder1_txtCurrent1" style="width:30px;" />
                of
                <span id="ctl00_ContentPlaceHolder1_lblTotalPage1">83</span>
                <input type="submit" name="ctl00$ContentPlaceHolder1$btnGo1" value="GO" id="ctl00_ContentPlaceHolder1_btnGo1" class="inputbtn" />
            </td>

我使用rvest包

尝试了以下代码
pgsession <- html_session(url)
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform,`ctl00$ContentPlaceHolder1$txtCurrent1` = 2)
result <- submit_form(pgsession,filled_form)

我没有把网站上的下一张表归还给我。我如何使用此包提交值并返回生成的HTML?我做了一些探讨,也许我应该使用Rhttrrcurl来做这件事。

1 个答案:

答案 0 :(得分:3)

我明白了。正确的代码是:

pgsession <- html_session("url")
pgform <- html_form(read_html(pgsession))[[1]]
filled_form <- set_values(pgform, `ctl00$ContentPlaceHolder1$txtCurrent1` =2)
result <- submit_form(pgsession,filled_form, submit='ctl00$ContentPlaceHolder1$btnGo1')
case_home <- read_html(result)