Question

我试图抓住一个realtor.com上学的项目。我有一个工作解决方案，需要使用rvest和httr个软件包的组合，但我想将其迁移到使用RCurl软件包，特别是使用getURLAsynchronous()功能。我知道如果我可以将其迁移到一次下载多个URL的解决方案，我的算法将会快得多。我目前解决这个问题的方法如下：

这是我到目前为止所拥有的：

library(RCurl)
library(rvest)
library(httr)

urls <- c("http://www.realtor.com/realestateandhomes-search/Denver_CO/pg-1?pgsz=50", 
          "http://www.realtor.com/realestateandhomes-search/Denver_CO/pg-2?pgsz=50")

prop.info <- vector("list", length = 0)
for (j in 1:length(urls)) {
prop.info <- c(prop.info, urls[[j]] %>% # Recursively builds the list using each url
             GET(add_headers("user-agent" = "r")) %>%
             read_html() %>% # creates the html object
             html_nodes(".srp-item-body") %>% # grabs appropriate html element
             html_text()) # converts it to a text vector
}

这让我得到了一个我可以随时使用的输出。我从网页上获取所有信息，然后从GET()输出中读取所有html。接下来，我找到了html节点，并将其转换为文本。我遇到的麻烦是当我尝试使用RCurl实现类似的东西时。

以下是我使用相同网址的内容：

        getURLAsynchronous(urls) %>%
        read_html() %>% 
        html_node(".srp-item-details") %>%
        html_text

当我在网址向量上调用getURIAsynchronous()时，并未下载所有信息。老实说，我不确定究竟是什么被刮掉了。但是，我知道它与我目前的解决方案有很大不同。

任何想法我做错了什么？或者可能是getURLAsynchronous()应如何运作的解释？

R

0 个答案: