我想将pdf文件转换为txt。这是我的代码:
testFile = urllib.URLopener()
testFile.retrieve("http://url_to_download" , "/Users/gabor_dev/Desktop/pdf_tst/tst.pdf")
content = ""
pdf = pyPdf.PdfFileReader(file("/Users/gabor_dev/Desktop/pdf_tst/tst.pdf", "rb"))
for i in range(0, pdf.getNumPages()):
f = open("/Users/gabor_dev/Desktop/pdf_tst/xxx.txt",'a')
content= pdf.getPage(i).extractText() + "\n"
c=content.split()
for a in c:
f.write(" ")
f.write(a)
f.write('\n')
f.close()
我的pdf已下载,但当我尝试将其转换为我的txt时,只有pdf的第一个单词显示在我的txt文件中,然后我收到此错误:
Traceback (most recent call last):
File "/Users/gabor_dev/PycharmProjects/text_class_tst/textClass.py", line 26, in <module>
f.write(" ")
ValueError: I/O operation on closed file
我做错了什么? 谢谢!
答案 0 :(得分:0)
更好地使用with open
:
import urllib
import pyPdf
testFile = urllib.URLopener()
testFile.retrieve("http://www.pdf995.com/samples/pdf.pdf" , "./tst.pdf")
content = ""
pdf = pyPdf.PdfFileReader(file("./tst.pdf", "rb"))
with open("./xxx.txt",'a') as f :
for i in range(0, pdf.getNumPages()):
content= pdf.getPage(i).extractText() + "\n"
c=content.split()
for a in c:
f.write(" ")
f.write(a)
f.write('\n')
经过测试和工作