我正在读取系统中存在的所有pdf文件,并将其从命令行实用程序“pdftotext”写入文本文件“output.txt”,但是在读取未正确构造的文件时(如图像的pdf文件和许多文件)其他),它会抛出一些错误,如
/home/vikrantsingh/Downloads/ARRAYS_NEW.pdf
/home/vikrantsingh/Downloads/GPOS_casestudy_solution_v2.pdf
/home/vikrantsingh/Downloads/Tutorial.pdf
/home/vikrantsingh/Downloads/The_C_Programming_Language.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (27972): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (41087): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (51900): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (62716): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (65450): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (68463): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
我想要的是当遇到第一个错误时,只需移动到下一个文件而不是读取同一个文件。我使用的是Python 2.7。我的代码就像
import os
import sys
import re
import subprocess
root = '/home'
targetpath = ""
path = os.path.join(root, targetpath)
filepath = []
count = 0
filesize = 0
for r,subdir,f in os.walk(path):
ultimate_path = os.path.join(path,r)
for file in f:
if file.find(".pdf")!=-1:
print os.path.join(ultimate_path,file)
filesize = os.path.getsize(os.path.join(ultimate_path,file))+filesize
subprocess.call(['pdftotext', os.path.join(ultimate_path,file), 'output.txt'])
#print file
count = count+1
print count
print filesize/(1048576.0)
这是从“pdftotext”读取pdf文件的示例代码。我想抓住错误,以便继续阅读下一页pdf。
我见过one post regarding this。 谢谢
答案 0 :(得分:1)
pdftotext
正在生成这些错误消息。它们不是Python异常,因此无法用try..except
抓住它们。
您可以pdftotext -q
运行silence the error messages:
subprocess.call(['pdftotext', '-q', os.path.join(ultimate_path,file), 'output.txt'])