Question

我想下载HTML的源代码。我该怎么办？

我尝试使用软件包read_html中的xml2。但是我收到一条错误消息。

test <- read_html('https://www.epicurious.com/search/Tropical%20Glazed%20Ham%20with%20Curried%20Pineapple%20Chutney')
Error in open.connection(x, "rb") : HTTP error 400.

在Mozilla中，可以通过源代码查看源代码。

Answer 1

read_html在我使用您提供的URL时似乎超时了，但要解决此问题，请先使用download.file，然后使用read_html将原始html代码保存在文件系统中。目的地：

temp <- tempfile()

download.file('https://www.epicurious.com/search/Tropical%20Glazed%20Ham%20with%20Curried%20Pineapple%20Chutney',
              destfile = temp)

res <- readLines(temp)

library(xml2)

parsed <- read_html(temp)

如何下载HTML的源代码

1 个答案: