Question

我想提前道歉，因为缺乏可重复的例子。我正在使用我的脚本的数据现在不在线，并且另外是保密的。

我想制作一个可以找到某个页面上所有链接的脚本。该脚本的工作原理如下：

* find homepage html to start with
* find all urls on this homepage 
* open these urls with Selenium
* save the html of each page in a list
* repeat this (find urls, open urls, save html)

此脚本的主力是以下功能：

function(listofhtmls) {
  urls <- lapply(listofhtmls, scrape)
  urls <- lapply(urls, clean)
  urls <- unlist(urls)
  urls <- urls[-which(duplicated(urls))]
  urls <- paste("base_url", urls, sep = "")
  html <- lapply(urls, savesource)
  result <- list(html, urls)
  return(result) }

网页被抓取，清理（我不需要所有网址），并删除了重复的网址。

所有这些对于大多数页面都可以正常工作，但有时在使用此功能时会出现奇怪的错误：

Error: '' does not exist in current working directory.
Called from: check_path(path)

我看不到工作目录和正在进行的解析之间的任何链接。我想解决这个错误，因为它现在阻止了我的其余部分。提前谢谢，再次借口不使用可重现的例子。

解析

0 个答案: