我使用scala使用以下代码从URL下载PDF文件,并且工作正常
var out: OutputStream = null;
var in: InputStream = null;
val url = new URL( """http://www.pdf995.com/samples/pdf.pdf""")
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
in = connection.getInputStream
val localfile = "sample2.pdf"
out = new BufferedOutputStream(new FileOutputStream(localfile))
val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
out.write(byteArray)
但是当我提供不以#34; PDF"结尾的网址时例如,下面给出的URL
https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=51&ved=0ahUKEwjq19ah8MbKAhXEj44KHeWAB6g4MhAWCBgwAA&url=http%3A%2F%2Fwww.us.fulbrightonline.org%2Fuploads%2Ffiles%2Fapplication_samples%2FForm9B_ETA_Reference_Form-Sample.pdf&usg=AFQjCNGZnon3ygHDJnW12Te8JrBR-o6jyw&sig2=OgSgD4HnUXZ9l_VS0AwGFg&bvm=bv.112454388,d.c2E&cad=rja
它不能正确生成PDF文件。打开PDF"不是PDF或损坏的错误"谈到。
答案 0 :(得分:1)
如果您在最后读取了您的网址并删除了哈希值(.pdf
之后的所有内容),您就会看到Google指向嵌入其中的链接:
https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=51&ved=0ahUKEwjq19ah8MbKAhXEj44KHeWAB6g4MhAWCBgwAA&url=http%3A%2F%2Fwww.us.fulbrightonline.org%2Fuploads%2Ffiles%2Fapplication_samples%2FForm9B_ETA_Reference_Form-Sample.pdf
这是直接链接(用于您的项目):
http://www.us.fulbrightonline.org/uploads/files/application_samples/Form9B_ETA_Reference_Form-Sample.pdf