Question

我正在使用Apache PDFBox版本2.0.x.我正在尝试使用书签搜索PDF，当我点击目标时，我应该能够获得书签所指的Pagenumber。这是我打印所有书签的代码。我可以做一个等于searchText.equals(current.getTitle())

的搜索

public static void printBookmark(PDOutlineNode bookmark, String indentation) throws IOException {
    PDOutlineItem current = bookmark.getFirstChild();
    COSObject targetPageRef = null;
    while (current != null) {
        System.out.println(indentation + current.getTitle());           
        printBookmark(current, indentation + "    ");
        current = current.getNextSibling();
    }
}

如果标题与我的搜索文本匹配，那么这就是我的目标书签。有人试过这个吗？

Answer 1

我找到了解决方案。

public static void printBookmark(PDOutlineNode bookmark, String indentation) throws IOException
{
    PDOutlineItem current = bookmark.getFirstChild();
    COSObject targetPageRef = null;
    while (current != null)
    {
        System.out.println(indentation + current.getTitle());       
        PDPageFitWidthDestination destination = (PDPageFitWidthDestination) current.getDestination();
        System.out.println("Page Number " + destination.retrievePageNumber());
        printBookmark(current, indentation + "    ");
        current = current.getNextSibling();
    }

}

Answer 2

给出一个PDDocument对象和该文档对象的PDOutlineItem，这是找到书签所指向的页面索引的另一种方法，如下所示：

import cv2
import numpy as np
from scipy import ndimage

# this is a function to do previous code
def mean_frames(frames, kernel):
    b = np.zeros(frames.shape)
    for i in range(frames.shape[0]):
        b[i] = ndimage.convolve(frames[i], k, mode='constant', cval=0.0)
    b = np.mean(b, axis=0) / frames.shape[0]
    return b

mean_N = 3 # frames to average
# read in 1 file to get dimenions
im = cv2.imread(f'{root}1.png', cv2.IMREAD_GRAYSCALE) 
# setup numpy matrix that will hold mean_N frames at a time
frames = np.zeros((mean_N, im.shape[0], im.shape[1]))
avg_frames = [] # list to store our 3 averaged frames
count = 0 # counter to position frames in 1st dim of 3D matrix for avg
k = np.ones((3, 3)) / (3 * 3) # kernel for 2D convolution

for j in range(1, 7): # 7 images
    file_name = root + str(j) + '.png'
    im = cv2.imread(file_name, cv2.IMREAD_GRAYSCALE) 
    frames[count, ::] = im # store in 3D matrix
    # if loaded more than min req. for avg, we average
    if j >= mean_N: 
        # average and store to list
        avg_frames.append(mean_frames(frames, k))
    # if the count is mean_N - 1, that means we need to replace
    # the 0th matrix in frames so that we are doing a 'moving avg'
    if count == (mean_N - 1):
        count = 0
    else: 
        count += 1 #increase position in 0th dim for 3D matrix storage

# ouput averaged frames
for i, f in enumerate(avg_frames):
    cv2.imwrite(f'{path}output{i}.jpg', f)

如何使用PDFBox获取PDF中书签内容的页面编号

2 个答案: