Question

使用iTextSharp库我可以使用各种模式在PDF文件中插入元数据。

关键字元数据中的关键字用于我的目的，用逗号分隔并用双引号括起来。一旦我编写的脚本运行，关键字就用三引号括起来。

关于如何避免使用XMP的任何想法或任何建议？

所需元数据的示例："keyword1","keyword2","keyword3"

当前元数据的示例："""keyword1"",""keyword2"",""keyword3"""

编码：

string _keywords = meta_line.Split(',')[1] + ","
                             + meta_line.Split(',')[2] + ","
                             + meta_line.Split(',')[3] + ","
                             + meta_line.Split(',')[4] + ","
                             + meta_line.Split(',')[5] + ","
                             + meta_line.Split(',')[6] + ","
                             + meta_line.Split(',')[7];
            _keywords = _keywords.Replace('~', ',');

            Console.WriteLine(metaFile);

            foreach (string inputFile in Directory.GetFiles(source, "*.pdf", SearchOption.TopDirectoryOnly))
            {
                if (Path.GetFileName(metaFile) == Path.GetFileName(inputFile))
                {
                    string outputFile = source + @"\output\" + Path.GetFileName(inputFile);
                    PdfReader reader = new PdfReader(inputFile);

                    using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
                    {

                        PdfStamper stamper = new PdfStamper(reader, fs);
                        Dictionary<String, String> info = reader.Info;
                        stamper.MoreInfo = info;

                        PdfWriter writer = stamper.Writer;

                        byte[] buffer = new byte[65536];

                        System.IO.MemoryStream ms = new System.IO.MemoryStream(buffer, true);
                        try
                        {
                            iTextSharp.text.xml.xmp.XmpSchema dc = new iTextSharp.text.xml.xmp.DublinCoreSchema();

                            dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.TITLE, new iTextSharp.text.xml.xmp.LangAlt(_title));

                            iTextSharp.text.xml.xmp.XmpArray subject = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
                            subject.Add(_subject);
                            dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.SUBJECT, subject);

                            iTextSharp.text.xml.xmp.XmpArray author = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
                            author.Add(_author);
                            dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.CREATOR, author);

                            PdfSchemaAdvanced pdf = new PdfSchemaAdvanced();

                            pdf.AddKeywords(_keywords);


                            iTextSharp.text.xml.xmp.XmpWriter xmp = new iTextSharp.text.xml.xmp.XmpWriter(ms);
                            xmp.AddRdfDescription(dc);
                            xmp.AddRdfDescription(pdf);
                            xmp.Close();

                            int bufsize = buffer.Length;
                            int bufcount = 0;
                            foreach (byte b in buffer)
                            {
                                if (b == 0) break;
                                bufcount++;
                            }
                            System.IO.MemoryStream ms2 = new System.IO.MemoryStream(buffer, 0, bufcount);
                            buffer = ms2.ToArray();

                            foreach (char buff in buffer)
                            {
                                Console.Write(buff);
                            }
                            writer.XmpMetadata = buffer;
                        }
                        catch (Exception ex)
                        {
                            throw ex;
                        }
                        finally
                        {
                            ms.Close();
                            ms.Dispose();
                        }

                        stamper.Close();
                     // writer.Close();

                    }

                    reader.Close();
                }
            }

以下方法未添加任何元数据 - 不确定原因（评论中的第3点）：

iTextSharp.text.xml.xmp.XmpArray keywords = new     iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
                            keywords.Add("keyword1");
                            keywords.Add("keyword2");
                            keywords.Add("keyword3");


                            pdf.SetProperty(iTextSharp.text.xml.xmp.PdfSchema.KEYWORDS, keywords);

Answer 1

我目前没有最新的iTextSharp版本。我有一个itextsharp 5.1.1.0。它不包含PdfSchemaAdvanced类，但它包含PdfSchema及其基类XmpSchema。我敢打赌你的lib中的PdfSchemaAdvanced也来自XmpSchema。

PdfSchema.AddKeyword只做一件事：

base["pdf:Keywords"] = keywords;

和XmpSchema.[].set依次执行：

base[key] = XmpSchema.Escape(value);

因此很清楚，该值正在“Escaped”，以确保特殊字符不会干扰存储格式。

现在，Escape函数，我所看到的，执行简单的逐字符扫描并执行替换：

" -> &quot;
& -> &amp;
' -> &apos;
< -> &lt;
> -> &gt;

就是这样。看起来像典型的html-entites处理。至少在我的库版本中。因此，它不会复制引号，只需更改其编码。

然后，AddRdfDescription似乎只是遍历存储的密钥，只是将它们包装在标签中，而不进行任何处理。所以，它会发出类似的东西：

Escaped"Contents&OfThis"Key

为：

<pdf:Keywords>Escaped&quot;Contents&amp;OfThis&quot;Key</pdf:Keywords>

除AddKeywords方法外，您还应该看到AddProperty方法。除了收到key而没有Escape（）的输入值之外，它的行为类似于add-keywords。

因此，如果您完全确定您的_keywords格式正确，您可以尝试：

AddProperty("pdf:Keywords", _keywords)

但我不鼓励你这样做。至少在我的itextsharp版本中，库似乎正确处理'关键字'并将其安全地格式化为RDF。

嘿，您也可以尝试使用我刚刚检查过的PdfSchema类而不是Advanced类。我打赌它仍然存在于图书馆中。

但是，总的来说，我认为问题出在其他地方。

对_keywords变量和的内容进行双倍或三重检查，然后检查生成的PDF的二进制内容。使用一些hexeditor或简单的纯文本编辑器（如记事本）查看它，并查找<pdf:Keywords>标记。检查它实际包含的内容。它可能一切正常，可能是你的pdf元数据阅读器添加了这些引号。

如何避免在C＃中使用iTextSharp为PDF文件中的元数据关键字添加双引号？

1 个答案: