https://stackoverflow.com/a/26673448/8845351 看到了这一点,现在我被困在将从pdf检索到的数据写入文本文件中? 尝试过pypdf2,pdftotext
我的代码:
import tempfile, subprocess
def pdf_to_string(file_object):
pdfData = file_object.read()
f=open('new_text.odt','wb')
#f.write(tempfile.NamedTemporaryFile())
# f.close()
tf = tempfile.NamedTemporaryFile()
tf.write(pdfData)
f.write(pdfData)
tf.seek(0)
outputTf = tempfile.NamedTemporaryFile()
if (len(pdfData) > 0) :
out, err = subprocess.Popen(["pdftotext", "-layout",
tf.name, outputTf.name ]).communicate()
return outputTf.read()
else :
return None
pdf_file="Invoice1.pdf"
file_object = file(pdf_file, 'rb')
print (pdf_to_string(file_object))
print(type(pdf_to_string(file_object)))
文件一旦写入就不包含任何数据