Question

我正在将tesseract用于OCR。我正在使用Ubuntu 18.04。

我有这个程序，可以从图像中提取文本并进行打印。我希望该程序创建一个新的文本文件并将提取的内容粘贴到新的文本文件上，但我只能做到这些

将内容复制到剪贴板
打开新的texteditor（geditor）文件我不知道如何粘贴复制的内容

这是我的程序，它从图像中提取文本

from pytesseract import image_to_string 
from PIL import Image
print image_to_string(Image.open('sample.jpg'))

这是将文本复制到剪贴板的程序，

import os
def addToClipBoard(text):
    command = 'echo ' + text.strip() + '| clip'
    os.system(command)

该程序将打开geditor并创建一个新的文本文件

import subprocess
proc = subprocess.Popen(['gedit', 'file.txt'])

任何帮助将不胜感激。

Answer 1

如果只需要文本，则打开一个文本文件并写入：

from pytesseract import image_to_string 
from PIL import Image
text =  image_to_string(Image.open('sample.jpg'))

with open('file.txt', mode = 'w') as f:
    f.write(text)

Answer 2

就像我在评论中建议的那样，创建一个新文件并将提取的文本写入其中：

with open('file.txt', 'w') as outfile:
    outfile.write(image_to_string(Image.open('sample.jpg')))

将图片文字写入新的文字文件？

2 个答案: