在R中提取安全网页(HTTPS)

时间:2016-09-02 03:28:04

标签: r web-scraping rvest

我正试图从安全网页中提取一些评论,如下所示:

# Attempt to extract information from a online secure page
library(rvest)
URL <- "https://www.bankbazaar.com/insurance/religare-health-insurance.html"
mainPage <- read_html(URL)
reviewsHTML <- html_nodes(mainPage, ".ellipsis_text")
reviewsHTML

以上代码为我输出 {xml_nodeset(0)} 。但是当我在我的本地系统中首先将该网页(使用ctrl + S)保存为“Religare Health Insurance.html”然后尝试提取评论时,我能够提取评论。

# Attempt to extract information from a offline secure page
library(rvest)
URL <- "Religare Health Insurance.html"
mainPage <- read_html(URL)
reviewsHTML <- html_nodes(mainPage, ".ellipsis_text")
reviewsHTML
{xml_nodeset (20)}
[1] <span itemprop="description" class="ellipsis_text">I have taken my health insurance from Religare......

问题:

  1. 当我尝试从同一个在线和离线页面中提取信息时,为什么会出现不同的行为?
  2. 我们如何使用R,在不下载页面的情况下提取相同的信息?

0 个答案:

没有答案