使用Aspose.Pdf for Cloud将pdf转换为html

时间:2017-10-16 17:09:11

标签: c# pdf aspose aspose.pdf

我在使用Aspose.Pdf-Cloud v1.0.9将pdf转换为html时遇到问题。

代码:

public byte[] ConvertPdfToHtml(byte[] doc, string fileName)
        {
            var pdfApi = new PdfApi(ConfigurationManager.AppSettings["AsposeKey"],
                ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);

            try
            {
                var apiResponse = pdfApi.PutConvertDocument("html", null,
                    Path.GetFileNameWithoutExtension(fileName) + ".html", doc);

                if (apiResponse != null && apiResponse.Status.Equals("Ok"))
                {
                    return apiResponse.ResponseStream;
                }

                throw new Exception("Couldn't convert pdf - " + fileName + " to HTML...");
            }
            catch (Exception ex)
            {
                NLogger.LogError("ConvertPdfToHtml - " + ex);
                throw;
            }
        }

似乎无论我上传什么(Adobe,selectPdf),我都会收到400错误请求。有人有幸运气吗?

到目前为止,Aspose.Words对我来说非常适合doc / docx到html。

更新:登录帐户后,看起来在幕后生成错误:

错误:方法或操作未实现。方法:将文档转换为在线指定的格式。参数:format'html',url'',outPath'testadobe.html'

这可能是一个aspose sdk问题,我会尝试联系他们,因为方法暴露在sdk上并完全按照我的需要使用docs,只需要它也可以使用pdf。

更新的代码:

public byte[] ConvertPdfToHtml(byte[] doc, string fileName)
        {
            var pdfApi = new PdfApi(ConfigurationManager.AppSettings["AsposeKey"],
                ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);
            var storageApi = new StorageApi(ConfigurationManager.AppSettings["AsposeKey"],
                ConfigurationManager.AppSettings["AsposeSID"], ConfigurationManager.AppSettings["AsposeUrl"]);

            try
            {
                storageApi.PutCreate(fileName, "", "", doc);

                var apiResponse = pdfApi.GetDocumentWithFormat(fileName, "html", "", "", Path.GetFileNameWithoutExtension(fileName) + ".html");

                if (apiResponse != null && apiResponse.Status.Equals("Ok"))
                {
                    var storageRes = storageApi.GetDownload(Path.GetFileNameWithoutExtension(fileName) + ".html", null, "");

                    var htmlDoc = ZipExtractor.ExtractHtmlFromZip(storageRes.ResponseStream,
                        Path.GetFileNameWithoutExtension(fileName) + ".html");

                    return htmlDoc;
                }

                throw new Exception("Couldn't convert pdf - " + fileName + " to HTML...");
            }
            catch (Exception ex)
            {
                NLogger.LogError("ConvertPdfToHtml - " + ex);
                throw;
            }
        }

为后代解压缩功能:

public static byte[] ExtractHtmlFromZip(byte[] zipBytes, string fileName)
        {
            var zipStream = new MemoryStream(zipBytes);

            if(zipStream == null) throw new NullReferenceException("zipStream doesn't contain any bytes...");

            var archive = new ZipArchive(zipStream);

            foreach (var zipEntry in archive.Entries)
            {
                if (zipEntry.FullName == fileName)
                {
                    var fileStream = zipEntry.Open();
                    using (var ms = new MemoryStream())
                    {
                        fileStream.CopyTo(ms);
                        var bytes = ms.ToArray();
                        return bytes;
                    }
                }
                throw new FileNotFoundException("Couldn't find " + fileName + " in zip archive...");
            }

            throw new Exception("Oops... looks like this should've never been reached in ExtractHtmlFromZip");
        }

1 个答案:

答案 0 :(得分:1)

我们有两个API可以将PDF文档转换为HTML。

  1. GET /v{version}/pdf/{name}
  2. PUT /v{version}/pdf/convert
  3. 我建议你使用第一个。以下cURL示例将帮助您了解API。

    curl -v "http://api.aspose.cloud/v1.1/pdf/Sample.pdf?format=html&appSID=B01A15E5-1B83-4B9A-8EB3-0F2BFA6AC766&signature=hHUw2HKmLY6tQFEevDg52uOLKak" \
    -X GET \
    -H "Content-Type: application/json" \
    -H "Accept: multipart/form-data" \
    -o Sample_out.zip 
    

    正如您所观察到的,我将输出(-o)文件扩展名设置为.zip而不是.html,原因是转换后的文件包含多个文件(.html,.css,图像文件),所以API压缩输出文件。

    此cURL示例使用Sample.pdf作为资源文件。

    P.S。我和Aspose一起担任开发人员传道人。