如何使用pyPdf在中间分割/裁剪pdf

时间:2014-12-06 21:12:15

标签: python pdf pypdf

我有一个看起来像this的pdf,我想将所有文字裁剪掉,几乎就在页面中间。我发现这个脚本做了类似的事情:

def splitHorizontal():
from pyPdf import PdfFileWriter, PdfFileReader
input1 = PdfFileReader(file("in.pdf", "rb"))
output = PdfFileWriter()

numPages = input1.getNumPages()
print "document has %s pages." % numPages

for i in range(numPages):
    page = input1.getPage(i)
    print page.mediaBox.getUpperRight_x(), page.mediaBox.getUpperRight_y()
    page.trimBox.lowerLeft = (25, 25)
    page.trimBox.upperRight = (225, 225)
    page.cropBox.lowerLeft = (50, 50)
    page.cropBox.upperRight = (200, 200)
    output.addPage(page)

outputStream = file("out.pdf", "wb")
output.write(outputStream)
outputStream.close()

然而,这些裁剪尺寸已调整到该特定示例。 任何人都可以告诉我如何找到正确的裁剪尺寸。

2 个答案:

答案 0 :(得分:1)

我最初从这里得到了脚本 - > Cropping pages of a .pdf file

我更多地阅读了作者所说的内容,终于意识到他曾说过:

  

生成的文档有一个200x200点的裁剪框,从媒体框内的25,25点开始。裁剪框内有25个点。

意思

page.cropBox.upperRight = (200, 200)

必须控制最终的保证金,因此我将声明调整为

page.cropBox.upperLeft = (290, 792)

将裁剪镜像到另一侧,并确保裁剪保持完整的垂直值

答案 1 :(得分:0)

将每一页切成两半,例如如果一个来源是 以小册子形式创建,然后重新组合 用于进一步处理,例如。文本提取

导入所需的库

from PyPDF2 import PdfFileWriter,PdfFileReader,PdfFileMerger

拆分左侧部分

with open("docu.pdf", "rb") as in_f:
    input1 = PdfFileReader(in_f)
    output = PdfFileWriter()

    numPages = input1.getNumPages()

    for i in range(numPages):
        page = input1.getPage(i)
        page.cropBox.lowerLeft = (60, 50)
        page.cropBox.upperRight = (305, 700)
        output.addPage(page)

    with open("left.pdf", "wb") as out_f:
        output.write(out_f)

拆分右侧部分:

with open("docu.pdf", "rb") as in_f:
    input1 = PdfFileReader(in_f)
    output = PdfFileWriter()

    numPages = input1.getNumPages()

    for i in range(numPages):
        page = input1.getPage(i)
        page.cropBox.lowerLeft = (300, 50)
        page.cropBox.upperRight = (540, 700)
        output.addPage(page)

    with open("right.pdf", "wb") as out_f:
        output.write(out_f)

左右组合(两列到两页)

input1 = PdfFileReader(open("left.pdf","rb"))
input2 = PdfFileReader(open("right.pdf","rb"))
output = PdfFileWriter()
numPages = input1.getNumPages()

for i in range(numPages):
    l = input1.getPage(i)
    output.addPage(l)
    r = input2.getPage(i)
    output.addPage(r)

with open("out.pdf", "wb") as out_f:
    output.write(out_f)