我们有一个简单的脚本来读取传入的PDF文件。如果是横向它将它旋转到肖像以供以后由另一个程序使用。一切都运行良好的pyPdf,直到我遇到一个文件,其中IndirectObject作为页面上/ Rotate键的值。 Object是可解析的,所以我可以告诉/ Rotate值是什么,但是当尝试rotateClockwise或rotateCounterClockwise时,我得到一个回溯,因为pyPdf不期望/ Rotate中的IndirectObject。我已经完成了相当多的尝试使用值覆盖IndirectObject的文件,但我没有得到任何地方。我甚至尝试将相同的IndirectObject传递给rotateClockwise并且它抛出相同的回溯,这是pdf.pyc中的一行
我的问题很简单。 。 。有没有pyPdf或PyPDF2的补丁,这使得它不会在这种设置上窒息,或者我可以采用不同的方式来旋转页面,或者我还没有看到/考虑过的其他库?我试过PyPDF2,它有同样的问题。我已经将PDFMiner视为替代品,但它似乎更倾向于从PDF文件中获取信息而不是操纵它们。这是我在ipython中使用pyPDF文件播放的输出,PyPDF2的输出非常相似,但信息的某些格式略有不同:
In [1]: from pyPdf import PdfFileReader
In [2]: mypdf = PdfFileReader(open("RP121613.pdf","rb"))
In [3]: mypdf.getNumPages()
Out[3]: 1
In [4]: mypdf.resolvedObjects
Out[4]:
{0: {1: {'/Pages': IndirectObject(2, 0), '/Type': '/Catalog'},
2: {'/Count': 1, '/Kids': [IndirectObject(4, 0)], '/Type': '/Pages'},
4: {'/Count': 1,
'/Kids': [IndirectObject(5, 0)],
'/Parent': IndirectObject(2, 0),
'/Type': '/Pages'},
5: {'/Contents': IndirectObject(6, 0),
'/MediaBox': [0, 0, 612, 792],
'/Parent': IndirectObject(4, 0),
'/Resources': IndirectObject(7, 0),
'/Rotate': IndirectObject(8, 0),
'/Type': '/Page'}}}
In [5]: mypage = mypdf.getPage(0)
In [6]: myrotation = mypage.get("/Rotate")
In [7]: myrotation
Out[7]: IndirectObject(8, 0)
In [8]: mypdf.getObject(myrotation)
Out[8]: 0
In [9]: mypage.rotateCounterClockwise(90)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
1049 def rotateCounterClockwise(self, angle):
1050 assert angle % 90 == 0
-> 1051 self._rotate(-angle)
1052 return self
1053
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
1054 def _rotate(self, angle):
1055 currentAngle = self.get("/Rotate", 0)
-> 1056 self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
1057
1058 def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [10]: mypage.rotateClockwise(90)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateClockwise(self, angle)
1039 def rotateClockwise(self, angle):
1040 assert angle % 90 == 0
-> 1041 self._rotate(angle)
1042 return self
1043
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
1054 def _rotate(self, angle):
1055 currentAngle = self.get("/Rotate", 0)
-> 1056 self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
1057
1058 def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [11]: mypage.rotateCounterClockwise(myrotation)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
1048 # @param angle Angle to rotate the page. Must be an increment of 90 deg.
1049 def rotateCounterClockwise(self, angle):
-> 1050 assert angle % 90 == 0
1051 self._rotate(-angle)
1052 return self
TypeError: unsupported operand type(s) for %: 'IndirectObject' and 'int'
如果有人想深入了解它,我很乐意提供我正在使用的文件。
答案 0 :(得分:3)
您需要将getObject应用于IndirectObject的实例,因此在您的情况下应该是
myrotation.getObject()
答案 1 :(得分:0)
我意识到这是一个老问题,但我在搜索中找到了这篇文章,试图尽早解决问题。据我所知,这是一个错误。
https://github.com/mstamy2/PyPDF2/pull/338/files
总之,我直接编辑了PyPDF2源来实现修复。找到PyPDF2 / pdf.py并搜索def _rotate(self,angle):
行。替换为以下内容:
def _rotate(self, angle):
rotateObj = self.get("/Rotate", 0)
currentAngle = rotateObj if isinstance(rotateObj, int) else rotateObj.getObject()
self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
它现在就像一个魅力。