使用iTextSharp制作符合PDF / A的PDF文件,仅包含图像

时间:2013-04-09 08:08:38

标签: c# .net pdf pdf-generation itextsharp

我正在使用iTextSharp从图像生成pdf-a文档。到目前为止,我还没有成功 编辑:我正在使用iTextSharp生成PDF

我尝试的只是制作一个pdf文件(1a或1b,无论适合什么),带有一些图像。这是我到目前为止提出的代码,但在尝试使用pdf-toolsvalidatepdfa验证时,我一直收到错误。

这是我从pdf-tools获得的错误(使用PDF / A-1b验证): 编辑:MarkInfo和Color Space尚未运行。其余的都没关系

Validating file "0.pdf" for conformance level pdfa-1a
The key MarkInfo is required but missing.
A device-specific color space (DeviceRGB) without an appropriate output intent is used.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document doesn't provide appropriate logical structure information.
Done.

主要流程

var output = new MemoryStream();
using (var iccProfileStream = new FileStream("ToPdfConverter/ColorProfiles/sRGB_v4_ICC_preference_displayclass.icc", FileMode.Open))
{
    var document = new Document(new Rectangle(PageSize.A4.Width, PageSize.A4.Height), 0f, 0f, 0f, 0f);
    var pdfWriter = PdfWriter.GetInstance(document, output);
    pdfWriter.PDFXConformance = PdfWriter.PDFA1A;
    document.Open();

    var pdfDictionary = new PdfDictionary(PdfName.OUTPUTINTENT);
    pdfDictionary.Put(PdfName.OUTPUTCONDITION, new PdfString("sRGB IEC61966-2.1"));
    pdfDictionary.Put(PdfName.INFO, new PdfString("sRGB IEC61966-2.1"));
    pdfDictionary.Put(PdfName.S, PdfName.GTS_PDFA1);

    var iccProfile = ICC_Profile.GetInstance(iccProfileStream);
    var pdfIccBased = new PdfICCBased(iccProfile);
    pdfIccBased.Remove(PdfName.ALTERNATE);
    pdfDictionary.Put(PdfName.DESTOUTPUTPROFILE, pdfWriter.AddToBody(pdfIccBased).IndirectReference);

    pdfWriter.ExtraCatalog.Put(PdfName.OUTPUTINTENT, new PdfArray(pdfDictionary));

    var image = PrepareImage(imageBytes);

    document.Open();
    document.Add(image);

    pdfWriter.CreateXmpMetadata();

    pdfWriter.CloseStream = false;
    document.Close();
}
return output.GetBuffer();

这是prepareImage()
它用于将图像压平为bmp,因此我不需要打扰alpha通道。

private Image PrepareImage(Stream stream)
{
    Bitmap bmp = new Bitmap(System.Drawing.Image.FromStream(stream));
    var file = new MemoryStream();
    bmp.Save(file, ImageFormat.Bmp);
    var image = Image.GetInstance(file.GetBuffer());

    if (image.Height > PageSize.A4.Height || image.Width > PageSize.A4.Width)
    {
        image.ScaleToFit(PageSize.A4.Width, PageSize.A4.Height);
    }
    return image;
}

任何人都可以帮助我找到修复错误的方向吗? 特别是device-specific color spaces

编辑:更多解释:我想要实现的目标是将扫描图像转换为PDF / A以进行长期数据存储

编辑:添加了一些我用来测试的文件
PDF和Pictures.rar(3.9 MB)
https://mega.co.nz/#!n8pClYgL!NJOJqSO3EuVrqLVyh3c43yW-u_U35NqeB0svc6giaSQ

2 个答案:

答案 0 :(得分:1)

好的,我在callas pdfToolbox中检查了你的一个文件,它说:“使用了设备颜色空间但没有PDF / A输出意图”。在编写输出意图到文档时,我认为你做错了什么。然后我使用相同的工具将该文档转换为PDF / A-1b,差别很明显。

可能还有其他错误需要修复,但这里的第一个错误是您在名为“OutputIntent”的PDF文件的目录字典中放了一个键。这是错误的:PDF规范的第75页声明密钥应命名为“OutputIntents”。

就像我说的那样,你的文件可能存在其他问题,但是密钥的名称错误导致PDF / A验证器找不到你试图放入文件的输出意图......

答案 1 :(得分:0)

  1. 首先,pdfx不是pdfa。

    1. 其次,您使用了错误的PdfWriter。它应该是PdfAWriter。
  2. 我不幸遇到图像问题的解决方案,但我有1和2。

    此致

    using System;
    using Microsoft.VisualStudio.TestTools.UnitTesting;
    using System.Text;
    using System.IO;
    using iTextSharp.text;
    using iTextSharp.text.pdf;
    using iTextSharp.text.html.simpleparser;
    using iTextSharp.tool.xml;
    using System.Drawing;
    using System.Drawing.Imaging;
    
    namespace Tests
    {
        /*
         * References:  
         * UTF-8 encoding http://stackoverflow.com/questions/4902033/itextsharp-5-polish-character
         * PDFA http://www.codeproject.com/Questions/661704/Create-pdf-A-using-itextsharp
         * Images http://stackoverflow.com/questions/15896581/make-a-pdf-conforming-pdf-a-with-only-images-using-itextsharp
         */
    
        [TestClass]
        public class UnitTest1
        {
            /*
             * IMPORTANT: Restrictions with html usage of tags and attributes
             * 1. Dont use * <head> <title>Sklep</title> </head>, because title is rendered to the page
             */
    
            // Test cases
            static string contents = "<html><body style=\"font-family:arial unicode ms;font-size: 8px;\"><p style=\"text-align: center;\"> Davčna številka dolžnika: 74605968<br /> </p><table> <tr> <td><b>\u0160t. sklepa: 88711501</b></td> <td style=\"text-align: right;\">Davčna številka dolžnika: 74605968</td> </tr> </table> <br/><img src=\"http://img.rtvslo.si/_static/images/rtvslo_mmc_logo.png\" /></body></html>";
            //static string contents = "<html><body style=\"font-family:arial unicode ms;font-size: 8px;\"><p style=\"text-align: center;\"> Davčna številka dolžnika: 74605968<br /> </p><table> <tr> <td><b>\u0160t. sklepa: 88711501</b></td> <td style=\"text-align: right;\">Davčna številka dolžnika: 74605968</td> </tr> </table> <br/></body></html>";
    
            //[TestMethod]
            public void CreatePdfHtml()
            {
                createPDF(contents, true);        
            }
    
            private void createPDF(string html, bool isPdfa)
            {
                TextReader reader = new StringReader(html);
                Document document = new Document(PageSize.A4, 30, 30, 30, 30);
                HTMLWorker worker = new HTMLWorker(document);
    
                PdfWriter writer;
                if (isPdfa)
                {
                    //set conformity level
                    writer = PdfAWriter.GetInstance(document, new FileStream(@"c:\temp\testA.pdf", FileMode.Create), PdfAConformanceLevel.PDF_A_1B);
    
                    //set pdf version
                    writer.SetPdfVersion(PdfAWriter.PDF_VERSION_1_4);
    
                    // Create XMP metadata. It's a PDF/A requirement.
                    writer.CreateXmpMetadata();
                }
                else
                {
                    writer = PdfWriter.GetInstance(document, new FileStream(@"c:\temp\test.pdf", FileMode.Create));
                }
    
                document.Open();
    
                if (isPdfa) // document should be opend, or it will fail
                {
                    // Set output intent for uncalibrated color space. PDF/A requirement.
                    ICC_Profile icc = ICC_Profile.GetInstance(Environment.GetEnvironmentVariable("SystemRoot") +  @"\System32\spool\drivers\color\sRGB Color Space Profile.icm");
                    writer.SetOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
                }
    
                //register font used in html
                FontFactory.Register(Environment.GetEnvironmentVariable("SystemRoot") + "\\Fonts\\ARIALUNI.TTF", "arial unicode ms");
    
                //adding custom style attributes to html specific tasks. Can be used instead of css
                //this one is a must fopr display of utf8 language specific characters (čćžđpš)
                iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
                ST.LoadTagStyle("body", "encoding", "Identity-H");
                worker.SetStyleSheet(ST);
    
                worker.StartDocument();
                worker.Parse(reader);
                worker.EndDocument();
                worker.Close();
                document.Close();
            }
    
        }
    
    
    }