我有2个pdf文件,每天通过终端,pdftotext模块转换成文本,然后用python脚本对数据进行排序。为了节省在终端输入for file in *.pdf; do pdftotext -layout "$file"; done
的时间,我想我可以将它添加到以下.py脚本的顶部。
我认为子进程是答案,但对于我的生活,我无法得到它来保存.pdf旁边的.txt文件,就像pdftotext通过终端一样。
我试过" Quotes",[Lists]和" Quotes"在[Lists]中,甚至尝试了" usr / lib / pdftotext"
为什么这不适用于主文件夹中的pdf,
import subprocess
process = subprocess.Popen(['pdftotext', '-layout', 'ALL.pdf', 'ALL.txt'])
由于
答案 0 :(得分:1)
不是答案,而是如何得到答案。当程序不起作用时,开始测试您的假设并打印信息。这个例子应该减少可能的问题。你应该看到这个程序的一些打印件。如果没有,则可能与如何运行程序有关。我假设你从命令行运行,你可以从python和pdftotext
看到标准和错误打印。如果没有,这将需要更新。
#!/usr/bin/env python
import subprocess
import os
import sys
import time
# just in case we are double clicking from a windows manager and the
# window doesn't stay up very long
print "Starting in..."
for i in range(3, 0, -1):
print i
time.sleep(1)
# if you don't see these, it may be how you are running the program.
print "Platform", sys.platform
print "Running in directory", os.getcwd()
if not os.path.isfile('ALL.pdf'):
print >> sys.stderr, "ALL.pdf does not exist"
exit(2)
if os.path.isfile("ALL.txt"):
print "Found old ALL.txt, deleting"
os.remove("ALL.txt")
print "Running pdftotext..."
subprocess.check_output(['pdftotext', '-layout', 'ALL.pdf', 'ALL.txt'])
if not os.path.isfile('ALL.txt'):
print "Program did not create ALL.txt"
exit(2)
print "Success! ALL.txt was written."
# note: if you don't see "Success!" something bad happened
我创建了一个测试文件并运行它
td@mintyfresh ~/tmp $ python test.py
Starting in...
3
2
1
Platform linux2
Running in directory /home/td/tmp
Found old ALL.txt, deleting
Running pdftotext...
Success! ALL.txt was written.
然后我注入了一个文件未找到错误并再次运行
td@mintyfresh ~/tmp $ mv ALL.pdf ALL.pdf-tmp
td@mintyfresh ~/tmp $ python test.py
Starting in...
3
2
1
Platform linux2
Running in directory /home/td/tmp
ALL.pdf does not exist
它很容易发现一个明显的问题。
答案 1 :(得分:0)
这是你应该做的,更多信息检查子流程文档
https://docs.python.org/2/library/subprocess.html
import os, subprocess
process = subprocess.call('pdftotext -layout ALL.pdf ALL.txt', shell=True, cwd=os.path.expanduser('~'))