如何使用python的输入模块?

时间:2019-12-31 00:24:54

标签: python types

我正在尝试使用其python API运行OCRmypdf。如果不声明语言,则可以运行它。但是,当我尝试声明一种语言时,它将引发错误。 api.py文件使用键入来声明language: List[str] = None,因此我从键入中导入了List并试图声明一个lang变量,这引发了错误。

我的代码:

source = 'fr'; target = 'en'; tess_lang = 'fr'
x: List[str] = ['eng', 'fr']
for dirpath, dirs, files in os.walk('.'):
    print(files)
    for pdf in [file for file in files if '.pdf' in file.lower()]:
        ocrmypdf.ocr(language = x, input_file = pdf, output_file = pdf.rsplit('.', 1)[0]+'_new.pdf', rotate_pages=True, deskew=True, force_ocr = True)

错误:

<ipython-input-41-90f7f46b6092> in <module>
      5     print(files)
      6     for pdf in [file for file in files if '.pdf' in file.lower()]:
----> 7         ocrmypdf.ocr(language = x, input_file = pdf, output_file = pdf.rsplit('.', 1)[0]+'_new.pdf', rotate_pages=True, deskew=True, force_ocr = True)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ocrmypdf\api.py in ocr(input_file, output_file, language, image_dpi, output_type, sidecar, jobs, use_threads, title, author, subject, keywords, rotate_pages, remove_background, deskew, clean, clean_final, unpaper_args, oversample, remove_vectors, threshold, force_ocr, skip_text, redo_ocr, skip_big, optimize, jpg_quality, png_quality, jbig2_lossy, jbig2_page_group_size, pages, max_image_mpixels, tesseract_config, tesseract_pagesegmode, tesseract_oem, pdf_renderer, tesseract_timeout, rotate_pages_threshold, pdfa_image_compression, user_words, user_patterns, fast_web_view, keep_temporary_files, progress_bar, tesseract_env)
    248     """
    249 
--> 250     options = create_options(**locals())
    251     check_options(options)
    252     return run_pipeline(options, api=True)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ocrmypdf\api.py in create_options(input_file, output_file, **kwargs)
    149             cmdline.append(str(val))
    150         else:
--> 151             raise TypeError(f"{arg}: {val} ({type(val)})")
    152 
    153     cmdline.append(str(input_file))

TypeError: language: ['eng', 'fr'] (<class 'list'>)

1 个答案:

答案 0 :(得分:0)

这似乎是其类型注释中的错误,可能应该是language: Optional[str] = None

要传递多种语言,看来您可以使用language='eng+fr'

“ api”的工作方式是将所有这些参数组合在一起,然后feed them through their commandline parser -仅支持标量值(无List等)