使用r

时间:2018-06-29 16:16:19

标签: r

我正在使用此代码从旅行顾问中提取数据。

install.packages("rvest")
library(rvest)
install.packages("xmlparsedata")
library(xmlparsedata)
install.packages("xml2")
library(xml2)
install.packages("XML")
library(XML)

url.1 <- "https://www.tripadvisor.ie/Restaurant_Review-g186605-d4046860- 
Reviews-The_Stage_Door_Cafe-Dublin_County_Dublin.html"

reviews <- url.1 %>%
read_html() %>%
html_nodes("#REVIEWS .innerBubble")

id <- reviews %>%
html_node(".quote a") %>%
html_attr("id")

quote <- reviews %>%
html_node(".quote span") %>%
html_text()

rating <- reviews %>%
html_node(".rating .rating.bubble") %>%
html_attr("alt") %>%
gsub(" of 5 stars", "", .) %>%
as.integer()

date <- reviews %>%
html_node(".ratingDate .relativeDate") %>%
html_attr("title") %>%
strptime("%b %d, %Y") %>%
as.POSIXct()

review <- reviews %>%
html_node(".entry .partial_entry" ) %>%
html_text()

a.1 <- data.frame(id, quote, rating, date, review, stringsAsFactors = FALSE)

我在这里面临的问题是评论中的“更多”按钮,由于它是从R存档的,因此我无法使用Rselenium软件包单击它。

install.packages("seleniumPipes")
library(seleniumPipes)
install.packages("devtools")
library(devtools)
ra <- "https://cran.r- 
project.org/src/contrib/Archive/rappdirs/rappdirs_0.3.tar.gz"
install.packages(ra, repos=NULL, type="source", dependencies = TRUE)
library(rappdirs)
sem <- "https://cran.r- 
project.org/src/contrib/Archive/semver/semver_0.1.0.tar.gz"
install.packages(sem, repos=NULL, type="source", dependencies = TRUE)
library(semver)
bin <- "https://cran.r- 
project.org/src/contrib/Archive/binman/binman_0.0.7.tar.gz"
install.packages(bin, repos=NULL, type="source", dependencies = TRUE)
library(binman)
sub <- "https://cran.r-project.org/src/contrib/subprocess_0.8.2.tar.gz"
install.packages(sub, repos=NULL, type="source")
library(subprocess)
wd <- "https://cran.r- 
project.org/src/contrib/Archive/wdman/wdman_0.2.2.tar.gz"
install.packages(wd, repos=NULL, type="source", dependencies = TRUE)
library(wdman)
packageurl <- "https://cran.r- 
project.org/src/contrib/Archive/RSelenium/RSelenium_1.6.2.tar.gz"
install.packages(packageurl, repos=NULL, type="source")
library(RSelenium)

我已经手动尝试安装所有已归档的软件包,但都徒劳无功,无法启动selenium。我也尝试过在docker上安装Selenium,但是没有运气。

remDr <- RSelenium::remoteDriver(remoteServerAddr = "192.168.43.66",
+                                  port = 4444L,
+                                  browserName = "phantomjs") 

成功,但是当我输入

  

remDr $ open()   出现以下错误。

1“正在连接到远程服务器” checkError(res)中的错误:   httr调用中发生未定义的错误。 httr输出:无法连接到10.3.100.207端口4444:网络无法访问

是否还有其他解决方法,可以使用rvest软件包单击“更多”按钮?因为这个RSelenium有点过时了。

这是“更多”按钮的屏幕快照链接

enter image description here

0 个答案:

没有答案