itext将PDF转换为csv

时间:2012-08-03 08:54:36

标签: java pdf itext

我正在尝试使用itext框架将pdf文件转换为csv以导入到excel中。

输出是乱码,我按下我在格式转换方面缺少一步但是我似乎无法在itext网站中找到信息并且正在寻求帮助。

当前如下。

package com.pdf.convert;

import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;

public class ThirdPDF {

    private static String INPUTFILE = "/location/test.pdf";
    private static String OUTPUTFILE = "/location/test.csv";

    public static void main(String[] args) throws DocumentException,
            IOException {
        Document document = new Document();

        PdfWriter writer = PdfWriter.getInstance(document,
                new FileOutputStream(OUTPUTFILE));
        document.open();
        PdfReader reader = new PdfReader(INPUTFILE);
        int n = reader.getNumberOfPages();
        PdfImportedPage page;
        // Go through all pages
        for (int i = 1; i <= n; i++) {
            // Only page number 2 will be included
            if (i == 2) {
                page = writer.getImportedPage(reader, i);
                Image instance = Image.getInstance(page);
                document.add(instance);
            }
        }
        document.close();
    }
} 

2 个答案:

答案 0 :(得分:0)

@AlexisPigeon http://itextpdf.com/itext.php确实表明这是可能的,但由于这个原因,它不是出于那个原因,只是另一个实现的结果

答案 1 :(得分:0)

将PDF文件转换为CSV文件。 当前的目录和文件创建基于Android Framework。 相应地根据您的框架更改路径和目录。

private void convertPDFToCSV(String pdfFilePath) {
        String myfolder = Environment.getExternalStorageDirectory() + "/Mycsv";
        if (createFolder(myfolder)) {
            try {
                Document document = new Document();
                document.open();
                FileOutputStream fos=new FileOutputStream(myfolder + "/MyCSVFile.csv");
                StringBuilder parsedText=new StringBuilder();
                PdfReader reader1 = new PdfReader(pdfFilePath);
                int n = reader1.getNumberOfPages();
                for (int i = 0; i <n ; i++) {
                    parsedText.append(parsedText+PdfTextExtractor.getTextFromPage(reader1, i+1).trim()+"\n") ;
                    //Extracting the content fromx the different pages
                }
                StringReader stReader = new StringReader(parsedText.toString());
                int t;
                while((t=stReader.read())>0)
                    fos.write(t);
                document.close();

            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    private boolean createFolder(String myfolder) {

        File f = new File(myfolder);
        if (!f.exists()) {
            if (!f.mkdir()) {
                return false;
            } else {
                return true;
            }
        }else{
            return true;
        }
    }