Question

我试图查看我是否可以使用文本的背景色和前景色在PDF内的表格中识别出可能的表格标题。通过PyMuPDF文本提取，我能够获得前景色。想知道是否也有一种获取背景色的方法。

Am使用pymupdf 1.16.2和python 3.7 我已经检查过文档，但是只能找到一个颜色字段，该颜色字段与文本颜色而不是背景颜色关联

如果有人知道如何使用pyMuPDF获取背景色，或者可能是其他包装，请告诉我

Answer 1

我需要一个类似的功能，但在PyMuPDF中找不到它，所以我编写了一个函数来获取包含文本的左上bbox中像素的颜色。

def getText2(page: fitz.Page, zoom_f=3) -> dict:
    """
    Function similar to fitz.Page.getText("dict"). But the returned dict
    also contains a key "bg_color" with color tuple as value for each block in "blocks".
    """
    # Retrieves the content of the page
    all_words = page.getText("dict")

    # Transform page into PIL.Image
    mat = fitz.Matrix(zoom_f, zoom_f)
    pixmap = page.getPixmap(mat)
    img = Image.open(io.BytesIO(pixmap.getPNGData()))
    img_border = fitz.Rect(0, 0, img.width, img.height)
    for block in all_words['blocks']:
        # Retrieve only text block (type 0)
        if block['type'] == 0:
            rect = fitz.Rect(*tuple(xy * zoom_f for xy in block['bbox']))
            if img_border.contains(rect):
                color = img.getpixel((rect.x0, rect.y0))
                block['bg_color'] = tuple(c/255 for c in color)
    return all_words

如何在PyMuPDF中获取文本的背景色

1 个答案: