最近我下载了pypdfocr,但是在文档中没有关于如何将pypdfocr作为库调用的示例,有人可以帮我调用它来转换单个文件吗?我刚刚找到了一个终端命令:
$ pypdfocr filename.pdf
答案 0 :(得分:1)
如果您正在寻找源代码,它通常位于您的python安装目录site-package下。更重要的是,如果您使用的是IDE(即Pycharm),它将帮助您找到目录和文件。这对于查找类非常有用,并向您展示如何实例化它,例如: https://github.com/virantha/pypdfocr/blob/master/pypdfocr/pypdfocr.py 这个文件有一个pypdfocr类类型,你可以重用它,并且可能做一个命令行会做的事情。
在该类中,开发人员已经提出了很多要解析的论据:
def get_options(self, argv):
"""
Parse the command-line options and set the following object properties:
:param argv: usually just sys.argv[1:]
:returns: Nothing
:ivar debug: Enable logging debug statements
:ivar verbose: Enable verbose logging
:ivar enable_filing: Whether to enable post-OCR filing of PDFs
:ivar pdf_filename: Filename for single conversion mode
:ivar watch_dir: Directory to watch for files to convert
:ivar config: Dict of the config file
:ivar watch: Whether folder watching mode is turned on
:ivar enable_evernote: Enable filing to evernote
"""
p = argparse.ArgumentParser(description = "Convert scanned PDFs into their OCR equivalent. Depends on GhostScript and Tesseract-OCR being installed.",
epilog = "PyPDFOCR version %s (Copyright 2013 Virantha Ekanayake)" % __version__,
)
p.add_argument('-d', '--debug', action='store_true',
default=False, dest='debug', help='Turn on debugging')
p.add_argument('-v', '--verbose', action='store_true',
default=False, dest='verbose', help='Turn on verbose mode')
p.add_argument('-m', '--mail', action='store_true',
default=False, dest='mail', help='Send email after conversion')
p.add_argument('-l', '--lang',
default='eng', dest='lang', help='Language(default eng)')
p.add_argument('--preprocess', action='store_true',
default=False, dest='preprocess', help='Enable preprocessing. Not really useful now with improved Tesseract 3.04+')
p.add_argument('--skip-preprocess', action='store_true',
default=False, dest='skip_preprocess', help='DEPRECATED: always skips now.')
#---------
# Single or watch mode
#--------
single_or_watch_group = p.add_mutually_exclusive_group(required=True)
# Positional argument for single file conversion
single_or_watch_group.add_argument("pdf_filename", nargs="?", help="Scanned pdf file to OCR")
# Watch directory for watch mode
single_or_watch_group.add_argument('-w', '--watch',
dest='watch_dir', help='Watch given directory and run ocr automatically until terminated')
#-----------
# Filing options
#----------
filing_group = p.add_argument_group(title="Filing optinos")
filing_group.add_argument('-f', '--file', action='store_true',
default=False, dest='enable_filing', help='Enable filing of converted PDFs')
#filing_group.add_argument('-c', '--config', type = argparse.FileType('r'),
filing_group.add_argument('-c', '--config', type = lambda x: open_file_with_timeout(p,x),
dest='configfile', help='Configuration file for defaults and PDF filing')
filing_group.add_argument('-e', '--evernote', action='store_true',
default=False, dest='enable_evernote', help='Enable filing to Evernote')
filing_group.add_argument('-n', action='store_true',
default=False, dest='match_using_filename', help='Use filename to match if contents did not match anything, before filing to default folder')
# Add flow option to single mode extract_images,preprocess,ocr,write
args = p.parse_args(argv)
您可以将任何这些参数传递给它的解析器,如下所示:
import pypdfocr
obj = pypdfocr.pypdfocr.pypdfocr()
obj.get_options([]) # this makes it takes default, but you could add CLI option to it. Other option might be [-v] or [-d,-v]
我希望这有助于您理解:)