使用java查找页面是否为b& w或PDF格式

时间:2012-03-06 18:00:56

标签: java pdf colors

与此相关:How do I know if PDF pages are color or black-and-white?

我需要知道当前页面是使用java的彩色还是黑色和白色。

我尝试使用PDFBox,执行以下操作:

public void checkColor(final File pdfFile) {
    PDDocument document;
    try {
        document = PDDocument.load(pdfFile);

        List<PDPage> pages = document.getDocumentCatalog().getAllPages();
        for (int i = 0; i < pages.size(); i++) {
            System.out.println();
            PDPage page = pages.get(i);
            //BufferedImage image = page.convertToImage();
            BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 72);

            parseColor(image, i);
        }

        printPages();

    } catch (IOException ex) {
        Logger.getLogger(PdfBoxParser.class.getName()).log(Level.SEVERE, null, ex);
    }
}
public static boolean isColorPixel(final int pixel) {
    //took from some post from stackoverflow
    System.out.print(pixel);
    System.out.print(",");
    int alpha = (pixel >> 24) & 0xff;
    int red = (pixel >> 16) & 0xff;
    int green = (pixel >> 8) & 0xff;
    int blue = (pixel) & 0xff;
    // gray: R = G = B
    boolean gray = (red == green && green == blue);
    return gray;
}

protected void parseColor(BufferedImage pImage, int pPageNumber) {
    int thresholdColor = Main.COLOR_THRESHOLD_PER_PAGE;
    for (int h = 0; h < pImage.getHeight(); h++) {
        for (int w = 0; w < pImage.getWidth(); w++) {
            int pixel = pImage.getRGB(w,h);
            boolean color = Main.isColorPixel(pixel);
            if (color) {
                thresholdColor--;
                if (thresholdColor == 0) {
                //do something like store this page number...
                .
                .
                .

问题是,我尝试了各种PDF(电子书,单页pdf等),每个“最终int像素”返回“-1”,还有一堆警告(org.apache.pdfbox.util.PDFStreamEngine processOperator不支持/禁用的操作:i / EMC / BMC / ri)。这可以解决吗?

0 个答案:

没有答案