Question

我有一些.pdf文件，其中包含Adobe Acrobat中添加的注释。我希望能够分析这些评论，但我有点坚持提取它们。我查看了pdftools包，但它似乎只能提取文本而不是注释。是否有可用于在R？

中提取注释的方法

Answer 1

PyMuPDF（https://pymupdf.readthedocs.io/en/latest/）是我发现可以使用的唯一Python库。

在基于Debian / Ubuntu的发行版中安装：

apt-get install python3-fitz

脚本：

import fitz
doc = fitz.open("example.pdf")
for i in range(doc.pageCount):
  page = doc[i]
  for annot in page.annots():
    print(annot.info["content"])

Answer 2

您是否尝试过可以访问PDF元素的PoDoFo或其他OpenSource工具？如果你将进行少量编程，你也可以在stackoverflow上查看Extracting PDF annotations/comments here

从pdf中提取评论

2 个答案: