在rvest中获取并设置cookie

时间:2017-10-20 09:02:40

标签: r cookies web-scraping rvest httr

如何在发出后续网络请求之前检查我的会话Cookie并指定这些Cookie?

我想抓一页,但我无法提交cookies。

我正在使用rvest库。

我的代码:

library(rvest)
WP <- html_session("http://www.wp.pl/")
headers <- httr::headers(WP)
cookies <- unlist(headers[names(headers) == "set-cookie"])
crumbs <- stringr::str_split_fixed(cookies, "; ", 4)
# method 1
stringr::str_split_fixed(crumbs[, 1], "=", 2)
# method 2
cookies(WP)

如何设置Cookie以进行网页抓取?

1 个答案:

答案 0 :(得分:0)

  1. 请注意,rvest建立在httr库之上。
  2. 由于某些我无法解释的原因,此代码无效until I rebooted RStudio
  3. Here's some code那就是诀窍:

    library(httr)
    library(rvest)
    
    httr::GET("http://www.wp.pl/", 
        set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d",
                    `__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1",
                    `__utmb` = "29983421.5.10.1413649536",
                    `__utmc` = "29983421",
                    `__utmt` = "1",
                    `__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)")) %>%
        read_html %>%  # Sample rvest code
        read_table(fill=TRUE) # Sample rvest code