Question

我们使用具有功能的非管理DLL来替换PDF文档中的文本（http://www.debenu.com/docs/pdf_library_reference/ReplaceTag.php）。我们正在尝试迁移到托管解决方案（ITextSharp或PdfSharp）。我知道之前已经问过这个问题并且答案是＆＃34;你不应该这样做＆＃34;或者＆＃34; PDF＆＃34;不容易支持它。但是，存在一个适用于我们的解决方案，我们只需要将其转换为C＃。有什么想法我应该接近吗？

Answer 1

根据您的library reference link，您使用Debenu PDFLibrary函数ReplaceTag。根据{{3}}

ReplaceTag函数只是替换页面内容流中的文本，因此对于大多数文档而言，它不会产生任何影响。对于一些简单的文档，它可能能够替换内容，但它实际上取决于PDF的构建方式。基本上它和做的一样：
DPL.CombineContentStreams();
string content = DPL.GetContentStreamToString();
DPL.SetPageContentFromString(content.Replace("Moby", "Mary"));

这应该适用于任何通用PDF库，它绝对适用于iText（夏普）：

void VerySimpleReplaceText(string OrigFile, string ResultFile, string origText, string replaceText)
{
    using (PdfReader reader = new PdfReader(OrigFile))
    {
        byte[] contentBytes = reader.GetPageContent(1);
        string contentString = PdfEncodings.ConvertToString(contentBytes, PdfObject.TEXT_PDFDOCENCODING);
        contentString = contentString.Replace(origText, replaceText);
        reader.SetPageContent(1, PdfEncodings.ConvertToBytes(contentString, PdfObject.TEXT_PDFDOCENCODING));

        new PdfStamper(reader, new FileStream(ResultFile, FileMode.Create, FileAccess.Write)).Close();
    }
}

警告：就像Debenu函数一样，对于大多数文档而言，此代码不会产生任何影响或甚至具有破坏性。对于一些简单的文档，它可能能够替换内容，但它实际上取决于PDF的构建方式。

顺便说一下，this Debenu knowledge base article继续：

如果您使用Debenu Quick PDF Library和标准字体创建PDF，那么ReplaceTag函数应该可以工作 - 但是，对于使用子集化字体甚至字距调整工具创建的PDF（其中将分割单词），搜索文本可能不会以简单的格式出现在内容中。

简而言之，ReplaceTag函数只能在某些有限的场景中使用，而不是您可以依赖的函数来搜索和替换文本。

因此，如果您在转移到托管解决方案期间也改变了源文档的创建方式，那么Debenu PDFLibrary函数ReplaceTag和上面的代码都不可能根据需要更改内容。

Answer 2

对于pdfsharp用户来说，这是一个有点可用的功能，我从我的项目中复制了该功能，它使用的是一种实用程序方法，该方法被其他方法消耗，因此未使用的结果。

它会忽略由Kerning创建的空白，因此可能会混淆结果（所有字符在同一空间中），具体取决于源材料

    public static void ReplaceTextInPdfPage(PdfPage contentPage, string source, string target)
    {
        ModifyPdfContentStreams(contentPage, stream =>
        {
            if (!stream.TryUnfilter())
                return false;
            var search = string.Join("\\s*", source.Select(c => c.ToString()));
            var stringStream = Encoding.Default.GetString(stream.Value, 0, stream.Length);
            if (!Regex.IsMatch(stringStream, search))
                return false;
            stringStream = Regex.Replace(stringStream, search, target);
            stream.Value = Encoding.Default.GetBytes(stringStream);
            stream.Zip();
            return false;
        });
    }


    public static void ModifyPdfContentStreams(PdfPage contentPage,Func<PdfDictionary.PdfStream, bool> Modification)
    {

        for (var i = 0; i < contentPage.Contents.Elements.Count; i++)
            if (Modification(contentPage.Contents.Elements.GetDictionary(i).Stream))
                return;
        var resources = contentPage.Elements?.GetDictionary("/Resources");
        var xObjects = resources?.Elements.GetDictionary("/XObject");
        if (xObjects == null)
            return;
        foreach (var item in xObjects.Elements.Values.OfType<PdfReference>())
        {
            var stream = (item.Value as PdfDictionary)?.Stream;
            if (stream != null)
                if (Modification(stream))
                    return;
        }
    }

替换PDF文档中的字符串（ITextSharp或PdfSharp）

2 个答案: