如何处理python中texttopdf抛出的错误

时间:2013-02-23 11:20:18

标签: python python-2.7 pdftotext

我正在读取系统中存在的所有pdf文件,并将其从命令行实用程序“pdftotext”写入文本文件“output.txt”,但是在读取未正确构造的文件时(如图像的pdf文件和许多文件)其他),它会抛出一些错误,如

/home/vikrantsingh/Downloads/ARRAYS_NEW.pdf
/home/vikrantsingh/Downloads/GPOS_casestudy_solution_v2.pdf
/home/vikrantsingh/Downloads/Tutorial.pdf
/home/vikrantsingh/Downloads/The_C_Programming_Language.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (27972): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (41087): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (51900): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (62716): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (65450): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (68463): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'

我想要的是当遇到第一个错误时,只需移动到下一个文件而不是读取同一个文件。我使用的是Python 2.7。我的代码就像

    import os
    import sys
    import re
    import subprocess
    root = '/home'
    targetpath = ""
    path = os.path.join(root, targetpath)
    filepath = []
    count = 0
    filesize = 0
    for r,subdir,f in os.walk(path):
        ultimate_path = os.path.join(path,r)
        for file in f:
             if file.find(".pdf")!=-1:
             print os.path.join(ultimate_path,file)
             filesize = os.path.getsize(os.path.join(ultimate_path,file))+filesize
             subprocess.call(['pdftotext', os.path.join(ultimate_path,file), 'output.txt'])
        #print file

        count = count+1
        print count
        print filesize/(1048576.0)

这是从“pdftotext”读取pdf文件的示例代码。我想抓住错误,以便继续阅读下一页pdf。

我见过one post regarding this。 谢谢

1 个答案:

答案 0 :(得分:1)

pdftotext正在生成这些错误消息。它们不是Python异常,因此无法用try..except抓住它们。

您可以pdftotext -q运行silence the error messages

 subprocess.call(['pdftotext', '-q', os.path.join(ultimate_path,file), 'output.txt'])