从URL下载PDF不是以pdf格式生成损坏的PDF文件

时间:2016-01-26 09:19:24

标签: scala

我使用scala使用以下代码从URL下载PDF文件,并且工作正常

var out: OutputStream = null;
var in: InputStream = null;
val url = new URL( """http://www.pdf995.com/samples/pdf.pdf""")
val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
in = connection.getInputStream
val localfile = "sample2.pdf"
out = new BufferedOutputStream(new FileOutputStream(localfile))
val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
out.write(byteArray)

但是当我提供不以#34; PDF"结尾的网址时例如,下面给出的URL

https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=51&ved=0ahUKEwjq19ah8MbKAhXEj44KHeWAB6g4MhAWCBgwAA&url=http%3A%2F%2Fwww.us.fulbrightonline.org%2Fuploads%2Ffiles%2Fapplication_samples%2FForm9B_ETA_Reference_Form-Sample.pdf&usg=AFQjCNGZnon3ygHDJnW12Te8JrBR-o6jyw&sig2=OgSgD4HnUXZ9l_VS0AwGFg&bvm=bv.112454388,d.c2E&cad=rja

它不能正确生成PDF文件。打开PDF"不是PDF或损坏的错误"谈到。

1 个答案:

答案 0 :(得分:1)

如果您在最后读取了您的网址并删除了哈希值(.pdf之后的所有内容),您就会看到Google指向嵌入其中的链接:

https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=51&ved=0ahUKEwjq19ah8MbKAhXEj44KHeWAB6g4MhAWCBgwAA&url=http%3A%2F%2Fwww.us.fulbrightonline.org%2Fuploads%2Ffiles%2Fapplication_samples%2FForm9B_ETA_Reference_Form-Sample.pdf

这是直接链接(用于您的项目):

http://www.us.fulbrightonline.org/uploads/files/application_samples/Form9B_ETA_Reference_Form-Sample.pdf