Question

我想发送一个url作为查询字符串，例如

localhost/abc.aspx?url=http:/ /www.site.com/report.pdf

并检测上述URL是否返回PDF文件。如果它将返回PDF，则会自动保存，否则会出错。

有些页面使用Handler来获取文件，因此在这种情况下我也想检测并下载它们。

localhost/abc.aspx?url=http:/ /www.site.com/page.aspx?fileId=223344

上面可能会返回一个pdf文件。

捕获这个的最佳方法是什么？

由于

Answer 1

您可以下载这样的PDF

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = req.GetResponse();
//check the filetype returned
string contentType = response.ContentType;
if(contentType!=null)
{
    splitString = contentType.Split(';');
    fileType = splitString[0];  
}

//see if its PDF
if(fileType!=null && fileType=="application/pdf"){
    Stream stream = response.GetResponseStream();
    //save it
    using(FileStream fileStream = File.Create(fileFullPath)){
      // Initialize the bytes array with the stream length and then fill it with data
      byte[] bytesInStream = new byte[stream.Length];
      stream.Read(bytesInStream, 0, bytesInStream.Length);    
      // Use write method to write to the file specified above
      fileStream.Write(bytesInStream, 0, bytesInStream.Length);
    }
}

response.Close();

它可能来自.aspx处理程序并不重要，它是在使用的服务器响应中返回的mime。

如果你得到一个通用的mime类型，比如application / octet-stream那么你必须使用更具启发性的方法。

假设您不能简单地使用文件扩展名（例如.aspx），那么您可以先将文件复制到MemoryStream（参见How to get a MemoryStream from a Stream in .NET?）。一旦你有了一个文件的内存流，你就可以对它进行“厚颜无耻”的观察（我说是厚颜无耻，因为它不是解析PDF文件的正确方法）

我不是PDF格式的专家，但我相信使用ASCII阅读器阅读前5个字符会产生“％PDF-”，因此您可以通过

识别出来

bool isPDF;
using(  StreamReader srAsciiFromStream = new StreamReader(memoryStream,
    System.Text.Encoding.ASCII)){
        isPDF = srAsciiFromStream.ReadLine().StartsWith("%PDF-");

}

//set the memory stream back to the start so you can save the file
memoryStream.Position = 0;

使用ASP.NET HttpWebRequest / HttpWebResponse从第三方下载PDF

1 个答案: