OCR Space API Python

时间:2018-02-21 22:26:22

标签: python ocr

我正在使用OCR Space API从图像中提取文本。我希望将'ParsedText'分别放在一个字符串中。

import requests
import json

def ocr_space_file(filename, overlay=False, api_key=API_KEY, language='eng'):
    """ OCR.space API request with local file.
        Python3.5 - not tested on 2.7
    :param filename: Your file path & name.
    :param overlay: Is OCR.space overlay required in your response.
                    Defaults to False.
    :param api_key: OCR.space API key.
                    Defaults to 'helloworld'.
    :param language: Language code to be used in OCR.
                    List of available language codes can be found on https://ocr.space/OCRAPI
                    Defaults to 'en'.
    :return: Result in JSON format.
    """

    payload = {'isOverlayRequired': overlay,
               'apikey': api_key,
               'language': language,
               }
    with open(filename, 'rb') as f:
        r = requests.post('https://api.ocr.space/parse/image',
                          files={filename: f},
                          data=payload,
                          )
    m = r.content.decode()
    jsonstr = json.loads(m)
    print jsonstr["ParsedResults"]

ocr_space_file(filename='sample.png', language='eng')

输出:

[{u'ParsedText': u'Python is a great language.', u'FileParseExitCode': 1, u'ErrorMessage': u'', u'TextOverlay': {u'HasOverlay': False, u'Lines': [], u'Message': u'Text overlay is not provided as it is not requested'}, u'ErrorDetails': u''}]

我试过

print jsonstr["ParsedResults"]["ParsedText"]

但是它给出了一个错误:

Traceback (most recent call last):
  File "img.py", line 33, in <module>
    ocr_space_file(filename='sample.png', language='eng')
  File "img.py", line 29, in ocr_space_file
    print jsonstr["ParsedResults"]["ParsedText"]
TypeError: list indices must be integers, not str

请帮帮我。

谢谢!

2 个答案:

答案 0 :(得分:0)

您的jsonstr["ParsedResults"]是数组中的单个词典。

[{u'ParsedText': u'Python is a great language.', ... }]

jsonstr["ParsedResults"][0]取出字典,例如:

jsonstr["ParsedResults"][0]["ParsedText"]

答案 1 :(得分:0)

使用类似的东西:

print jsonstr["ParsedResults"][0]["ParsedText"]