Question

让我们考虑一下这段代码：

public class Test1{

    public static void CreatePdf(String src) throws IOException, COSVisitorException{
    PDRectangle rec= new PDRectangle(400,400);
    PDDocument document= null;
    document = new PDDocument();
    PDPage page = new PDPage(rec);
    document.addPage(page);
    PDDocumentInformation info=document.getDocumentInformation();
 PDStream stream= new PDStream(document);
    info.setAuthor("PdfBox");
    info.setCreator("Pdf");
    info.setSubject("Stéganographie");
    info.setTitle("Stéganographie dans les documents PDF");
    info.setKeywords("Stéganographie, pdf");
    content= new PDPageContentStream(document, page, true, false );
    font= PDType1Font.HELVETICA;

String hex = "4C0061f";  // shows "La"
//Notice that we have 00 between 4C and 61 where 00 =null character


       StringBuilder sb = new StringBuilder();
        for (int count = 0; count < hex.length() - 1; count += 2)
    {
        String output = hex.substring(count, (count + 2));
        int decimal = Integer.parseInt(output, 16);
        StringBuilder ae= sb.append((char)decimal);
    }
        String tt=sb.toString();
    content.beginText();
    content.setFont(font, 12);
    content.appendRawCommands("15 385 Td\n");
   content.appendRawCommands("("+tt+")"+"Tj\n");
    content.endText();
   content.close();
    document.save("doc.pdf");
    document.close();       
    }

我的问题是：为什么＆＃34; 00＆＃34;被PDF文档中的空格替换为空字符？请注意，此空字符的宽度为0.0，但它在PDF文档中显示为空格！因此，我得到：＆＃34; L a＆＃34; 而不是＆＃34; La＆＃34;

Answer 1

为什么“00”被PDF文档中的空格替换为空字符？

如果您查看PDF，您会发现用于文本的字体定义为：

9 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Helvetica
/Encoding /WinAnsiEncoding
>>
endobj

因此，您使用带有 WinAnsiEncoding 的字体。如果你看一下PDF specification附件D中该编码的定义，你就会发现没有任何低于32（十进制）的代码被映射到任何东西。因此，您要做的是使用手头编码中未定义的字符。因此，没有定义行为;对于那些未定义的代码点，Acrobat Reader似乎使用正宽度。

如果要确保隐藏的字符根本不会导致任何位移，则应在字体字典中添加明确的宽度数组，参见在PDF specification中的9.6.2部分，并确保你的不可见字符的宽度为0.（顺便说一句，在这里你也会看到没有嵌入宽度数组 - 就像PDFBox一样 - 已经被弃用了几年前）。

请注意，此空字符
的宽度为0.0

一旦你处于未定义的范围内，任何事情都可能发生，不同的程序会有不同的假设。

PS 部分代码......在您的行之间

font= PDType1Font.HELVETICA;

和

String hex = "4C0061f";  // shows "La"

我添加了以下代码：

InputStream afmStream = ResourceLoader.loadResource("org/apache/pdfbox/resources/afm/Helvetica.afm");
AFMParser afmParser = new AFMParser(afmStream);
afmParser.parse();
FontMetric afmMetrics = afmParser.getResult();
List<Float> newWidths = new ArrayList<Float>();
for (CharMetric charMetric : afmMetrics.getCharMetrics())
{
    if (charMetric.getCharacterCode() < 0)
        continue;
    while (charMetric.getCharacterCode() >= newWidths.size())
        newWidths.add(0f);
    newWidths.set(charMetric.getCharacterCode(), charMetric.getWx());
}
font.setFirstChar(0);
font.setLastChar(newWidths.size() - 1);
font.setWidths(newWidths);

此代码应阅读PDFBox中包含的Helvetica.afm字体指标资源，并从中创建 FirstChar ， LastChar 和宽度条目。它在这里工作正常，但如果它不在您的安装中，只需从PDFBox jar中提取afm文件并使用FileInputStream.

读取它

由于某种原因，00字符似乎仍然认为它有一些宽度，但是其他字符32（十进制）可以正常使用，例如。

String hex = "4C0461f";

显示没有间隙的“La”。如果我正确地解释了您关于1C和1D的前（现已删除）问题，这已经可以帮助您继续。

PPS：关于评论中的问题：

你能告诉我这种方法的所有缺点吗？为什么这个方法与重音字符不匹配，例如（Lé），你的代码只与没有重音的字符匹配，但是当我们有重音时，我们得到Lé而不是Le ..我想知道只有什么缺点你的代码：）

我不能告诉所有人（因为我真的不是那么深刻的字体问题）但实质上上述方法有些不完整。

如开头所述，您使用带有 WinAnsiEncoding 的字体，其中没有任何低于32（十进制）的代码映射到任何内容。通过添加 FirstChar ， LastChar 和 Widths 条目，我们尝试为代码低于32的字符定义零宽度。

尽管如此，我们既不关心这些代码的编码信息（编码仍然是纯粹的 WinAnsiEncoding ），也没考虑字体是否实际包含这些代码的任何信息。此外，让事情变得不那么可控，我们谈论的是 Helvetica ，即PDF浏览器必须带来的标准14种字体之一。只要明确给出的信息和观众带来的信息相互矛盾，PDF观众可能会倾向于偏向于他们自己的信息。

为什么特别是重音字符会出现问题？我不确定。不过，我猜测，这与字体通常不会将重音字符作为单独的实体带来，而是组合重音和非重音字符这一事实有关。也许在内部，查看器使用的字体有一些信息，这些组合字符映射到32以下的代码点，因此，当您的显式代码低于32并且字体隐含使用此类代码时，显示会变得古怪。

基本上我通常建议不要这样做。对于普通的PDF文档，根本没有必要。

在您的情况下，由于您已将文档命名为Stéganographiedansles documents PDF ，您显然希望以某种方式隐藏PDF中的信息。使用隐形，不可打印的字符似乎是一种方法;因此，你可以尝试那个方向。但PDF确实提供了许多方法，可以将任何数量的信息放入PDF中，而不会直接显示。

因此，根据您的具体目标，我认为其他方法可能会更安全地隐藏信息，例如私人 PieceInfo 部分或其他一些词典中的自定义标记...

Answer 2

最终代码：

public class Test4 {

    public static final String src="...";

    public static void CreatePdf(String src) throws IOException, COSVisitorException{
        PDRectangle rec= new PDRectangle(400,400);
        PDDocument document=null;
        document= new PDDocument();
        PDPage page= new PDPage(rec);
        document.addPage(page);
        PDPageContentStream canvas= new PDPageContentStream(document,page,true,false);
        PDFont font= PDType1Font.HELVETICA;
        String hex = "4C1D61f";
        InputStream afmStream = ResourceLoader.loadResource("org/apache/pdfbox/resources/afm/Helvetica.afm");
        AFMParser afmParser = new AFMParser(afmStream);
        afmParser.parse();
        FontMetric afmMetrics = afmParser.getResult();
        List<Float> newWidths = new ArrayList<Float>();
        for (CharMetric charMetric : afmMetrics.getCharMetrics())
{
     if (charMetric.getCharacterCode() < 0)
         continue;
      while (charMetric.getCharacterCode() >= newWidths.size())
          newWidths.add(0f);

      newWidths.set(charMetric.getCharacterCode(), charMetric.getWx());

}

        font.setFirstChar(0);

        font.setLastChar(newWidths.size() - 1);
        font.setWidths(newWidths);



     StringBuilder sb = new StringBuilder();
        for (int count = 0; count < hex.length() - 1; count += 2)
    {
        String output = hex.substring(count, (count + 2));
        int decimal = Integer.parseInt(output, 16);
        StringBuilder ae= sb.append((char)decimal);
    }
        String tt=sb.toString();
    canvas.beginText();
    canvas.setFont(font, 12);
    canvas.appendRawCommands("15 385 Td\n");
   canvas.appendRawCommands("("+tt+")"+"Tj\n");
    canvas.endText();
   canvas.close();
    document.save("doc.pdf");
    document.close();       
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException, COSVisitorException {
        // TODO code application logic here
        Test4 tes= new Test4();
        tes.CreatePdf(src);
    }
}

使用PDFBox插入NULL字符

2 个答案: