Question

我试图从网站下载一些图片。我有一系列urls图像，我必须下载。所以我用这段代码运行它：

 dlphoto <- function(x){
   print(x)
   setTimeLimit(5)
   Sys.sleep(0.3)
   download.file(x , destfile = basename(x))
   }

这个功能有一个主要问题：当我用它运行15000 urls的向量时，它会冻结整个R会话，并停止对任何事情作出反应。但是，如果我单独运行urls，它可以正常工作。或者当我运行例如1:50 urls时，它也有效。但是，当我把1：100放在一起时，它也冻结了......那么请你帮我解决一下这个问题吗？

起初我使用这一行来打电话：

 dlphoto(allimage[,2])

然后我改为这个：

 dlphoto(allimage[c(1:50),2])
 dlphoto(allimage[c(51:100),2])
 dlphoto(allimage[c(101:150),2])
 dlphoto(allimage[c(151:200),2])
 and so on untill 15000

等等。但它仍然冻结了很多。每次它死亡，我必须关闭R并搜索过程到达的地方并从那里开始。我经常收到这条警告信息：

   Error in download.file(x, destfile = basename(x)) : 
   reached CPU time limit

另外，您能否帮助我将下载的照片保存在

中

    /Users/name/Desktop/M2/Mémoire M2/Scrapingtest/photos

非常感谢!!

Answer 1

可能有几项改进。我假设OP正在使用来自download.file个包的base method，如果libcurl未设置为quiet = T method = "libcurl"且quiet = TRUE，则仅在一次尝试中支持单个文件}。

因此，修复应该是在download.file函数中使用dlphoto <- function(x){ print(x) download.file(x , destfile = basename(x), method="libcurl", quiet = TRUE) }和download.file(x , destfile = basename(x), method="libcurl", quiet = TRUE)。改变的功能：

timeout

OR

options

注意：在上述两种情况下，都不会显示进度条。

我认为来自download.file的{{1}}的价值足以确保在发生延误时从download.file返回。

应检查non-zero的返回值错误。任何# This function will display progressbar for each file dlphoto <- function(x){ for(file in x){ print(fine) download.file(file , destfile = basename(file)) } }返回值都表示失败。

如果要查看进度条（一次可能不需要1500个文件），则应修改函数以一次处理1个文件。修改后的功能将是：

.umbrella {
  width: 500px;
  height: 500px;
  background: grey;
}

.square {
  background: red;
  width: 100px;
  height: 50px;
  margin: 5px;
}

R下载，文件冻结

1 个答案: