用PDFBOX写阿拉伯字符

时间:2014-09-25 12:58:25

标签: java pdfbox

  1. 更新1
  2. 我尝试使用pdfbox在pdf文档中编写一些阿拉伯字符。结果我得到了一些奇怪的角色。您可以在下面找到我用于测试的代码段。请注意,相同的代码用于打印拉丁字符没有任何问题。

    public static void main(String[] args) throws Exception {
    
        PDDocument document = new PDDocument();
    
        PDPage page = new PDPage(PDPage.PAGE_SIZE_A4);
        document.addPage(page);
    
        PDPageContentStream stream = new PDPageContentStream(document, page,true, true);
    
        //Use of a unicode font
        PDFont font = PDTrueTypeFont.loadTTF(document,"C:/arialuni.ttf");
    
        font.setFontEncoding(new WinAnsiEncoding());
    
        stream.setFont(font, 12);
        stream.beginText();
    
        stream.moveTextPositionByAmount(40, 600);
    
        stream.drawString("سي ججس ححسيب حسججسيبنم حح ");
        stream.endText();
        stream.close();
        document.save("c:\\resultpdf.pdf");
        document.close();
    
    }
    

    感谢您的帮助。我尝试从微软网站下载的Unicode字体,但我仍然有相同的结果。

    1. 更新2
    2. 使用方法' drawUnicodeString'和方法#twttT'我得到了PDFBOX-922的表格 我能够写阿拉伯语字符,但它们是断开连接并从左到右排序。以下是两种方法' drawUnicodeString'和' loadTTF'

      public void drawUnicodeString(String text) throws IOException {
          COSString string = new COSString();
          for (int i = 0; i < text.length(); i++) {
              char c = text.charAt(i);
              string.append(c >> 8);
              string.append(c & 0xff);
          }
          ByteArrayOutputStream buffer = new ByteArrayOutputStream();
          string.writePDF(buffer);
          appendRawCommands(buffer.toByteArray());
          appendRawCommands(32);
          appendRawCommands(getISOBytes("Tj\n"));
      }
      
      
      public static PDType0Font loadTTF(PDDocument doc, InputStream is)
              throws IOException {
          /* Load the font which we will convert to Type0 font. */
          PDTrueTypeFont pdTtf = PDTrueTypeFont.loadTTF(doc, is);
      
          TrueTypeFont ttf = pdTtf.getTTFFont();
          CMAPEncodingEntry unicodeMap = null;
          for (CMAPEncodingEntry candidate : ttf.getCMAP().getCmaps()) {
              if (candidate.getPlatformId() == CMAPTable.PLATFORM_WINDOWS
                      && candidate.getPlatformEncodingId() == CMAPTable.ENCODING_UNICODE) {
                  unicodeMap = candidate;
                  break;
              }
          }
          if (unicodeMap == null) {
              throw new RuntimeException(
                      "To use as CIDFont, the TTF must have a Windows platform Unicode encoding");
          }
          float scaling = 1000f / ttf.getHeader().getUnitsPerEm();
      
          MyPDCIDFontType2Font pdCidFont2 = new MyPDCIDFontType2Font();
          pdCidFont2.setBaseFont(pdTtf.getBaseFont());
          pdCidFont2.setFontDescriptor((PDFontDescriptorDictionary) pdTtf
                  .getFontDescriptor());
          /* Fixme -- should determine the minimum and maximum charcode in the map */
          int[] cid2gid = new int[65536];
          List<Float> widths = new ArrayList<Float>();
          int[] widthValues = ttf.getHorizontalMetrics().getAdvanceWidth();
          for (int i = 0; i < cid2gid.length; i++) {
              int glyph = unicodeMap.getGlyphId(i);
              cid2gid[i] = glyph;
              widths.add((float) i);
              widths.add((float) i);
              widths.add(widthValues[glyph] * scaling);
          }
          pdCidFont2.setCidToGid(cid2gid);
          pdCidFont2.setWidths(widths);
          pdCidFont2.setDefaultWidth(widths.get(0).longValue());
      
          /* Now construct the type0 font that we actually return */
          myType0Font pdFont0 = new myType0Font();
          pdFont0.setDescendantFont(pdCidFont2);
          pdFont0.setDescendantFonts(new COSObject(pdCidFont2.getCOSObject()));
          pdFont0.setEncoding(COSName.IDENTITY_H);
      
          pdFont0.setBaseFont(pdTtf.getBaseFont());
      
          // pdfont0.setToUnicode(COSName.IDENTITY_H); XXX how to express identity
          // mapping as ToUnicode program? */
          return pdFont0;
      }
      

      以下是打印的字符:

      disconnected arabic letters

      我不知道为什么这些字符会断开

2 个答案:

答案 0 :(得分:5)

阿拉伯语可以通过应用PDFBOX-922PDFBOX-1287来编写。(差异文件附在问题说明中) 我希望补丁将在2.0版本中应用。

答案 1 :(得分:3)

我建议您尝试将ICU4J罐子添加到您的项目中: ICU4J