从pdf创建索引

时间:2011-08-02 10:55:04

标签: perl

  

可能重复:
  How do I Index PDF files and search for keywords?

使用PDF创建索引。

1 个答案:

答案 0 :(得分:1)

我认为你可以使用pyPdf Python库(http://pybrary.net/pyPdf/)。 此代码显示包含所需单词的页数:

from pyPdf import PdfFileReader

input = PdfFileReader(file("YourPDFFile.pdf", "rb"))

numberOfPages = input.getNumPages()

i = 1
while i <  numberOfPages:
    oPage = input.getPage(i)
    text = oPage.extractText()
    text.encode('utf8', 'ignore')
    if text.find('What are you looking for') != -1:
        print i
    i += 1

同样但使用Python 3

from pyPdf import PdfFileReader

input = PdfFileReader(open("YourPDFFile.pdf", "rb"))

numberOfPages = input.getNumPages()

i = 1
while i <  numberOfPages:
    oPage = input.getPage(i)
    text = oPage.extractText()
    text.encode('utf8', 'ignore')
    if text.find('What are you looking for') != -1:
        print(i)
    i += 1