Question

我在Windows Powershell中运行一个循环，该循环在文件目录上运行pdfminer，pdf2txt.py中的脚本。这是循环：

$PATH="D:/PDFdirectory"

foreach ($f in $PATH)
{   
python pdf2txt.py -o $f.txt "$f" "${f%.pdf}.txt"
}

当我尝试在Powershell中运行上面的代码时，我收到了权限被拒绝错误。该错误指向下面的pdf2txt脚本中的outfp = file（outfile，＆＃39; w + b＆＃39;）。

if outfile:
    outfp = file(outfile, 'w+b')
else:
    outfp = sys.stdout
if outtype == 'text':
    device = TextConverter(rsrcmgr, outfp, codec=codec, laparams=laparams,
                           imagewriter=imagewriter)
elif outtype == 'xml':
    device = XMLConverter(rsrcmgr, outfp, codec=codec, laparams=laparams,
                          imagewriter=imagewriter)
elif outtype == 'html':
    device = HTMLConverter(rsrcmgr, outfp, codec=codec, scale=scale,
                           layoutmode=layoutmode, laparams=laparams,
                           imagewriter=imagewriter)
elif outtype == 'tag':
    device = TagExtractor(rsrcmgr, outfp, codec=codec)
else:
    return usage()
for fname in args:
    fp = file(fname, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    for page in PDFPage.get_pages(fp, pagenos,
                                  maxpages=maxpages, password=password,
                                  caching=caching, check_extractable=True):
        page.rotate = (page.rotate+rotation) % 360
        interpreter.process_page(page)
    fp.close()
device.close()
outfp.close()
return

if __name__ == '__main__': sys.exit(main(sys.argv))

我已将pdf2txt.py中的读写条件更改为二进制文件以与Windows兼容，但现在我被卡住了。有人可以帮帮我吗？

由于

Answer 1

您的第一个问题是PowerShell脚本中的语法不正确。

这一位：

"${f%.pdf}.txt"

要求查找名为f%.pdf的变量并将“.txt”添加为其值以创建字符串。你没有这样的变量，所以你得到的只是“.txt”。

您的第二个问题（我猜测）是您似乎想要遍历该目录中的所有PDF文件。但是你还没有指示PowerShell这样做。

所以在一起，我想你想要这个代码：

$PATH="D:/PDFdirectory"

foreach ($file in Get-ChildItem $PATH -Include *.pdf) {
    python pdf2txt.py -o "$($f.BaseName).txt" -O $f.DirectoryName ($f.FullName -replace '.pdf$','.txt')
}

更多解释：

$f.DirectoryName - 包含文件的目录的路径
$f.BaseName - 没有扩展名的文件的名称
"$($f.BaseName).txt" - 括号标记将在构造最终字符串之前执行的子表达式。
($f.FullName -replace '.pdf$','.txt') - 使用正则表达式替换在完整文件名（和路径）的末尾查找.pdf，并将其替换为.txt。

IOError：Errno 13权限被拒绝D：/ PDF在Powershell中运行python循环时

1 个答案: