Question

我的要求

1）我需要识别特定的文本模式
2）然后替换它具有相同格式文本的预定义文本值的文本模式模式，如字体，字体颜色，粗体......

3）我能够识别文本，用预定义的值替换该文本，但写入 PDF失败了。

我尝试了以下2个appraches来写PDF

1）通过覆盖PDFTextStripper的writeString（String string，List textPositions）

2）使用cosArray.add（new COSString（replacementField））;或cosArray.set（...）

方法1的结果 - 通过覆盖writeString

此代码生成的pdf未以PDF格式打开。我可以用文字打开，但没有原始文本的格式。

方法2的结果 - 使用cosArray.add或cosArray.set（...） 我只看到生成的PDF格式的框。

方法1的代码 - 通过覆盖writeString

public void rewrite(String templatePDFPath) throws IOException {

    PDDocument document = null;

    Writer pdfWriter = null;

    try {

        File templateFile = new File(templatePDFPath);
        document = PDDocument.load(templateFile);

        this.setSortByPosition(true);
        this.setStartPage(0);
        this.setEndPage(document.getNumberOfPages());

        pdfWriter = new PrintWriter(Utils.getFilePathWithTimeStamp(templatePDFPath).toString());

        this.writeText(document, pdfWriter);

    } finally {
        if (document != null) {
            document.close();
        }

        if (null != pdfWriter)
            pdfWriter.close();

        // if (null != pdfWriter)
        // pdfWriter.close();

    }
}

protected void writeString(String string, List<TextPosition> textPositions) throws IOException {

    for (int i = 0; i < textPositions.size(); i++) {
        TextPosition text = textPositions.get(i);

        String currentCharcter = text.getUnicode();
        // System.out.println("String[" + text.getXDirAdj() + "," + //
        // text.getYDirAdj() + " fs=" + text.getFontSize() // + " xscale=" +
        // text.getXScale() + " height=" + // text.getHeightDir() + "
        // space=" // +
        // text.getWidthOfSpace() + " width=" + text.getWidthDirAdj() + //
        // "]" +
        // currentCharcter);

    }
    String replacedString = replaceFields(string.trim());

    if (!(string.equals(replacedString))) {
        System.out.println("Field " + string + " is replaced by value " + replacedString);
        // super.writeString(replacedString, textPositions);
        super.writeString(replacedString);
    }

}

方法2的代码 - 使用cosArray.add或cosArray.set（...）

public List<String> replaceFieldsInCosArray(COSArray cosArray) {
    List<String> replacedStrings = new ArrayList<String>();
    String stringsOfCOSArray = "";

    for (int cosArrayIndex = 0; cosArrayIndex < cosArray.size(); cosArrayIndex++) {
        Object cosObject = cosArray.get(cosArrayIndex);

        if (cosObject instanceof COSString) {
            COSString cosString = (COSString) cosObject;
            stringsOfCOSArray += cosString.getString();
        }
    }
    stringsOfCOSArray = stringsOfCOSArray.trim();



    //cosArray.clear();



        String replacedField = this.replaceFields(stringsOfCOSArray);
        System.out.println("cosText:" + stringsOfCOSArray + ":replacedField:" + replacedField);

        cosArray.add(new COSString(replacedField));

        if (!stringsOfCOSArray.equals(replacedField)) {
            replacedStrings.add(replacedField);
        }

强文

Answer 1

1）通过覆盖PDFTextStripper的writeString（String string，List textPositions）

PDFTextStripper是用于提取纯文本的工具。因此，您的输出无法以pdf格式打开也就不足为奇了。此外，单词可以打开它，因为单词将其识别为纯文本并将其打开。

2）使用cosArray.add（new COSString（replacementField））;或cosArray.set（...）

这里的意思并不清楚。特别是，你在谈论cosArray？

有人可能会假设您指的是 TJ 运算符的参数，但有多种原因反对该假设：

TJ 运算符只是显示运算符的众多文本之一，也是唯一一个接受am数组参数的运算符;因此，你只会看到一些有问题的运营商;
您的代码会假设您尝试识别的整个文本模式是由同一操作绘制的;为什么要这样？
您似乎认为cosString.getString()会返回可理解的内容;遗憾的是，一般情况并非如此，只是如果有问题的字体使用的标准编码变得越来越不常见;
此外，您假设替换文本的字形包含在替换文本的字体中。为什么他们呢？嵌入式字体子集变得越来越普遍......

因此，你在这里的意思是什么？

所有人都说，如果你只是偶然使用天真的pdf工作，你可能想看看@Tilmann指出的问题的答案。有一小部分代码可能适用的pdf。

如果你的pdf碰巧更复杂，那么即使描述这种方法也会超出单个stackoverflow答案的范围。

顺便说一句，您的要求没有明确定义，特别是

使用相同格式的文本模式替换具有预定义文本值的文本模式，例如字体，字体颜色，粗体......

如果预定义文本有三个字母，则替换有两个字母，找到的出现的第一个字形为红色，第二个为绿色，第三个为蓝色，如何使用这三个字体绘制两个替换字形颜色？

无法使用PDFBox 2.0.2替换PDF中的文本

1 个答案: