Android-从PDF获取文本

时间:2015-04-21 05:08:59

标签: android pdf

我想从SD卡中的PDF文件中读取文本。如何从存储在SD卡中的PDF文件中获取文本?

我试过了:

public class MainActivity extends ActionBarActivity implements TextToSpeech.OnInitListener {

    private TextToSpeech tts;
    private String line = null;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tts = new TextToSpeech(getApplicationContext(), this);

        final TextView text1 = (TextView) findViewById(R.id.textView1);

        findViewById(R.id.button1).setOnClickListener(new OnClickListener() {

            private String[] arr;

            @Override
            public void onClick(View v) {
                File sdcard = Environment.getExternalStorageDirectory();

                // Get the text file

                File file = new File(sdcard, "test.pdf");

                // ob.pathh
                // Read text from file

                StringBuilder text = new StringBuilder();
                try {
                    BufferedReader br = new BufferedReader(new                            FileReader(file));

                    // int i=0;
                    List<String> lines = new ArrayList<String>();

                    while ((line = br.readLine()) != null) {
                        lines.add(line);
                        // arr[i]=line;
                        // i++;
                        text.append(line);
                        text.append('\n');
                    }
                    for (String string : lines) {
                        tts.speak(string, TextToSpeech.SUCCESS, null);
                    }
                    arr = lines.toArray(new String[lines.size()]);
                    System.out.println(arr.length);
                    text1.setText(text);

                } catch (Exception e) {
                    e.printStackTrace();
                }

            }
        });

    }

    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            int result = tts.setLanguage(Locale.US);
            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Log.e("TTS", "This Language is not supported");
            } else {
                // speakOut();
            }

        } else {
            Log.e("TTS", "Initilization Failed!");
        }
    }

}

注意:如果文件是文本文件(test.txt)但不适用于pdf(test.pdf),它可以正常工作

但是这里的文字不是从PDF中获取的,而是像字节码一样。我怎样才能做到这一点?

提前致谢。

2 个答案:

答案 0 :(得分:16)

我有iText的解决方案。

摇篮,

compile 'com.itextpdf:itextg:5.5.10'

爪哇,

  try {
            String parsedText="";
            PdfReader reader = new PdfReader(yourPdfPath);
            int n = reader.getNumberOfPages();
            for (int i = 0; i <n ; i++) {
                parsedText   = parsedText+PdfTextExtractor.getTextFromPage(reader, i+1).trim()+"\n"; //Extracting the content from the different pages
            }
            System.out.println(parsedText);
            reader.close();
        } catch (Exception e) {
            System.out.println(e);
        }

答案 1 :(得分:2)

PDF格式不是您的普通文本文件..您需要对PDF进行更多研究,这是您将获得的最佳答案 How to read pdf in my android application?