Question

我创建了这个小编程来搜索目录中的所有PDF，确定它们是否可搜索，然后将它们移动到相应的目录中。

我是Python的新手，它可能不是最好的方法，但它确实有效，直到文件名中包含White Space并且我返回以下内容。

任何帮助都将不胜感激。

>>> os.system("pdffonts.exe " + pdfFile + "> output.txt")
99



import os
import glob
import shutil
directory = os.chdir("C:\MyDir") # Change working directory
fileDir = glob.glob('*.pdf') # Create a list of all PDF's in declared   directory
numFiles = len(fileDir) # Lenght of list
startFile = 0 # Counter variable
seekWord = "TrueType"
while startFile < numFiles:
    pdfFile=fileDir[startFile]
    os.system("pdffonts.exe " + pdfFile + "> output.txt")
    file1output = open("output.txt","r")
    fileContent = file1output.read()
    if seekWord in fileContent:
        shutil.move(pdfFile , "NO_OCR")
    else: shutil.move(pdfFile, "OCR")
    startFile = startFile + 1

Answer 1

os.system()使用shell执行命令。您必须引用您的文件名以便shell将空格识别为文件的一部分，您可以使用shlex.quote() function：

os.system("pdffonts.exe " + shlex.quote(pdfFile) + "> output.txt")

但是，没有理由使用os.system()和shell。您应该使用subprocess.run() function并将其配置为在不使用重定向或shell的情况下传回输出：

import subprocess

seekWord = b"TrueType"
for pdfFile in fileDir:
    result = subprocess.run(["pdffonts.exe", pdfFile], stdout=subprocess.PIPE)
    fileContent = result.stdout
    if seekWord in fileContent:
        # ...

由于pdfFile直接传递给pdffonts.exe ，因此无需担心shell解析而且空格不再重要。

请注意，我将seekWord更改为bytes字面值，而result.stdout是字节值（此处无需尝试将结果解码为Unicode）。

Answer 2

似乎问题不是来自python，而是来自Windows shell。你需要用引号括起来。由于我没有你的程序pdffonts.exe，我无法调试。我还让你的代码更加pythonic

import os
import glob
import shutil
directory = os.chdir("C:\MyDir") # Change working directory
fileDir = glob.glob('*.pdf') # Create a list of all PDF's in declared   directory

seekWord = "TrueType"
for pdfFile in fileDir:
    os.system('pdffonts.exe "{0}"> output.txt'.format(pdfFile))
    file1output = open("output.txt","r")
    fileContent = file1output.read()
    if seekWord in fileContent:
        shutil.move(pdfFile , "NO_OCR")
    else: 
        shutil.move(pdfFile, "OCR")

文件名python 3.4.2中的空格

2 个答案: