我有一个看起来像this的pdf,我想将所有文字裁剪掉,几乎就在页面中间。我发现这个脚本做了类似的事情:
def splitHorizontal():
from pyPdf import PdfFileWriter, PdfFileReader
input1 = PdfFileReader(file("in.pdf", "rb"))
output = PdfFileWriter()
numPages = input1.getNumPages()
print "document has %s pages." % numPages
for i in range(numPages):
page = input1.getPage(i)
print page.mediaBox.getUpperRight_x(), page.mediaBox.getUpperRight_y()
page.trimBox.lowerLeft = (25, 25)
page.trimBox.upperRight = (225, 225)
page.cropBox.lowerLeft = (50, 50)
page.cropBox.upperRight = (200, 200)
output.addPage(page)
outputStream = file("out.pdf", "wb")
output.write(outputStream)
outputStream.close()
然而,这些裁剪尺寸已调整到该特定示例。 任何人都可以告诉我如何找到正确的裁剪尺寸。
答案 0 :(得分:1)
我最初从这里得到了脚本 - > Cropping pages of a .pdf file。
我更多地阅读了作者所说的内容,终于意识到他曾说过:
生成的文档有一个200x200点的裁剪框,从媒体框内的25,25点开始。裁剪框内有25个点。
意思
page.cropBox.upperRight = (200, 200)
必须控制最终的保证金,因此我将声明调整为
page.cropBox.upperLeft = (290, 792)
将裁剪镜像到另一侧,并确保裁剪保持完整的垂直值
答案 1 :(得分:0)
将每一页切成两半,例如如果一个来源是 以小册子形式创建,然后重新组合 用于进一步处理,例如。文本提取
导入所需的库
from PyPDF2 import PdfFileWriter,PdfFileReader,PdfFileMerger
拆分左侧部分
with open("docu.pdf", "rb") as in_f:
input1 = PdfFileReader(in_f)
output = PdfFileWriter()
numPages = input1.getNumPages()
for i in range(numPages):
page = input1.getPage(i)
page.cropBox.lowerLeft = (60, 50)
page.cropBox.upperRight = (305, 700)
output.addPage(page)
with open("left.pdf", "wb") as out_f:
output.write(out_f)
拆分右侧部分:
with open("docu.pdf", "rb") as in_f:
input1 = PdfFileReader(in_f)
output = PdfFileWriter()
numPages = input1.getNumPages()
for i in range(numPages):
page = input1.getPage(i)
page.cropBox.lowerLeft = (300, 50)
page.cropBox.upperRight = (540, 700)
output.addPage(page)
with open("right.pdf", "wb") as out_f:
output.write(out_f)
左右组合(两列到两页)
input1 = PdfFileReader(open("left.pdf","rb"))
input2 = PdfFileReader(open("right.pdf","rb"))
output = PdfFileWriter()
numPages = input1.getNumPages()
for i in range(numPages):
l = input1.getPage(i)
output.addPage(l)
r = input2.getPage(i)
output.addPage(r)
with open("out.pdf", "wb") as out_f:
output.write(out_f)