我想从互联网上下载pdf文件并将其保存在本地HD中。下载后,pdf输出文件有很多空页。我该怎么做才能解决它?
示例:
require(XML)
url <- ('http://cran.r-project.org/doc/manuals/R-intro.pdf')
download.file(url, 'introductionToR.pdf')
提前致谢。
答案 0 :(得分:29)
尝试使用wb-mode这样:
download.file(url, 'introductionToR.pdf', mode="wb")
。
对我而言,它就是这样的。
答案 1 :(得分:-1)
您可以使用tabulizer包下载pdfs并将表导出为data.frame
https://ropensci.org/tutorials/tabulizer_tutorial.html
install.packages("devtools")
# on 64-bit Windows
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"), INSTALL_opts = "--no-multiarch")
# elsewhere
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"))
library(tabulizer)
f2 <- "https://github.com/leeper/tabulizer/raw/master/inst/examples/data.pdf"
extract_tables(f2, pages = 1, method = "data.frame")