Question

我正在尝试抓取用R中的javascript动态生成的网页。

我发现有多个清晰的教程，所有教程都包含相同的最后一步：通过system（）命令执行phantomjs以创建html页。

每次执行此system（）命令时，R中的控制台都会无限冻结，并生成一个空的html页。

有人知道为什么会这样吗？

#set working directory
setwd("/Users/thomasroelens/Documents/work/DeTijd/projects/current/2018/110718-mvp-db")

#url that should be scraped
url <- "https://web.archive.org/web/20110831155858/http://aws.amazon.com/ec2/pricing"

#create a js file
connection <- "scrape.js"

#create the js file
writeLines(sprintf("var page = require('webpage').create();
               page.open('%s', function(){
               console.log(page.content);//page source
               phantom.exit();
               });",url), con=connection)

#input for the system command
system_input <- "/Users/thomasroelens/Documents/work/DeTijd/projects/current/2018/110718-mvp-db/phantomjs scrape.js > scrape.html"

#run the system command
system(system_input)

以上，我已经提供了一个基于this教程的示例。我也尝试过these two教程，但是每次出现相同问题时，都可以使用。

当我从终端运行命令时，我也会得到一个空的html页面。

使用系统命令运行phantomjs时R控制台冻结

0 个答案: