如何将Pdf文件转换为文本

时间:2019-11-07 14:28:41

标签: android file-manager pdf-reader

我想从android中的文件管理器中选择一个pdf文件,并将其转换为文本,以便文本到语音可以读取它。我正在从android开发者网站关注此文档;但是,此示例用于打开文本文件。我正在使用PdfReader类/库来打开文件并转换为文本。但我不知道如何将其与Uri集成。 这是我需要使用PdfReader从pdf转换为文本的代码

SELECT ebeln, ebelp, netwr, sakto, vproz, vproz * netwr / 100 AS sakto_netwr
FROM ekpo
LEFT JOIN (
          SELECT
          mandt
          ,ebeln
          ,ebelp
          ,sakto
          ,SUM(vproz) as vproz
          FROM ekkn
          GROUP BY mandt, ebeln, ebelp, sakto
          )
          ekkn ON ekkn.mandt = ekpo.mandt AND ekkn.ebeln = ekpo.ebeln AND ekkn.ebelp = ekpo.ebelp

我正在使用意图致电文件管理器,以便用户可以选择pdf文件

PdfReader pdfReader = new PdfReader(file.getPath());
stringParser = PdfTextExtractor.getTextFromPage(pdfReader, 1).trim();
pdfReader.close();

然后我要获取uri并打开文件

fab.setOnClickListener(new View.OnClickListener() {
@Override
   public void onClick(View view) {
      intent = new Intent(Intent.ACTION_OPEN_DOCUMENT);
      intent.setType("*/*");
      startActivityForResult(intent, READ_REQUEST_CODE);
   }
});

2 个答案:

答案 0 :(得分:0)

public class SyncPdfTextExtractor {
  // TODO: When you have your own Premium account credentials, put them down here:
  private static final String CLIENT_ID = "FREE_TRIAL_ACCOUNT";
  private static final String CLIENT_SECRET = "PUBLIC_SECRET";
  private static final String ENDPOINT = "https://api.whatsmate.net/v1/pdf/extract?url=";

  /**
   * Entry Point
   */
  public static void main(String[] args) throws Exception {
    // TODO: Specify the URL of your small PDF document (less than 1MB and 10 pages)
    // To extract text from bigger PDf document, you need to use the async method.
    String url = "https://www.harvesthousepublishers.com/data/files/excerpts/9780736948487_exc.pdf";
    SyncPdfTextExtractor.extractText(url);
  }

  /**
   * Extracts the text from an online PDF document.
   */
  public static void extractText(String pdfUrl) throws Exception {
    URL url = new URL(ENDPOINT + pdfUrl);
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setDoOutput(true);
    conn.setRequestMethod("GET");
    conn.setRequestProperty("X-WM-CLIENT-ID", CLIENT_ID);
    conn.setRequestProperty("X-WM-CLIENT-SECRET", CLIENT_SECRET);

    int statusCode = conn.getResponseCode();
    System.out.println("Status Code: " + statusCode);
    InputStream is = null;
    if (statusCode == 200) {
        is = conn.getInputStream();
        System.out.println("PDF text is shown below");
        System.out.println("=======================");
    } else {
        is = conn.getErrorStream();
        System.err.println("Something is wrong:");
    }

    BufferedReader br = new BufferedReader(new InputStreamReader(is)); 
    String output;
    while ((output = br.readLine()) != null) {
        System.out.println(output);
    }
    conn.disconnect();
  }

}
------------------------------------

Copying above code follow below Steps-

Specify the URL of your online PDF document on line 20.
Replace the Client ID and Secret on lines 10 and 11 if you have your own credentials.

答案 1 :(得分:0)

使用此
摇篮:-

implementation 'com.itextpdf:itextg:5.5.10'
try {
      String parsedText="";
      PdfReader reader = new PdfReader(yourPdfPath);
      int n = reader.getNumberOfPages();
      for (int i = 0; i <n ; i++) {
           parsedText   = parsedText+PdfTextExtractor.getTextFromPage(reader, i+1).trim()+"\n"; //Extracting the content from the different pages
      }
      System.out.println(parsedText);
      reader.close();
   } catch (Exception e) {
      System.out.println(e);
}