Question

我想读取一个文本文件并从中提取德语文本，并使用PIL和python 2.7将其写在png图像上，但是当使用.text（）写入图像时，无论何时Ü或某些外国文字，我都会得到未知文本角色来了。我已经使用arialunicodems.ttf作为字体。

首先，我使用Microsoft Azure认知视觉从图像中提取文本，并对每个单词使用.encode（'utf-8'），将单词组合成英语句子，然后使用mtranslate python库将其转换为德语。然后，我使用arialunicodems.ttf作为字体，并使用PIL Image的.text（）函数在png上绘制文本。它可以正确地绘制德语，中文，印地文等语言。但是后来，我想为用户添加一个功能，以便用户可以在未正确翻译的情况下更改翻译后的文本。为此，我将原始文本和翻译后的文本保存在.txt文件中，并向用户显示txt文件的内容，用户可以根据需要对其进行更改，然后将更改后的文本再次保存到txt文件中。然后使用另一个python程序，将文本添加到图像中。但是，这一次只要Ü，文本就会变得乱七八糟，它将在图像上绘制Ã☐。对于印地文，它全是胡言乱语。可能是什么问题？

工作代码：我将单词串联起来组成句子的部分（保存在可变文本中）。

for word in word_infos:
                bbox = [int(num) for num in word["boundingBox"].split(",")]
                if bbox[0]>=x and bbox[1]>=y and bbox[0]+bbox[2]<=x+w and bbox[1]+bbox[3]<=y+h:
                    text = text+word["text"].encode('utf-8')+" "

我将文字写到图像的部分

im = Image.open("check.png")
d = ImageDraw.Draw(im)
helvetica = ImageFont.truetype("arialunicodems.ttf",10)
d.text((x,y), mtranslate.translate(text, sys.argv[3], sys.argv[2]), font=helvetica, fill=(0,0,0))

不起作用的代码：我将提取的文本保存到txt文件的部分

for word in word_infos:
                bbox = [int(num) for num in word["boundingBox"].split(",")]
                if bbox[0]>=x and bbox[1]>=y and bbox[0]+bbox[2]<=x+w and bbox[1]+bbox[3]<=y+h:
                    text = text+word["text"].encode('utf-8')+" "
file.write("orignaltext:"+text+"\n")

我从txt文件提取文本并在图像上书写的部分

im = Image.open("check.png")
d = ImageDraw.Draw(im)
file2 = open("1.txt","r")
printframe = file2.readlines()
#j and traceorig is defined to extract text in loop
orig = printframe[j*6+3][traceorig:len(printframe[j*6+3])-1].encode('utf-8')
#xstr,ystr,r,g,b are extracted from image
d.text((int(xstr),int(ystr)), mtranslate.translate(orig,"de","en").encode('utf-8'), font=helvetica, fill=(int(r), int(g), int(b)))

我想要英语的“概述”
用德语：Überblick
在印地文中：अवलोकन
在更新的代码中，当我在终端上打印时，它可以正确打印，但是会在图像上写入
德语：Ã☐berblick
在印地文中：找不到字符，请查看图像链接Hindi translated image。

更新1：

产生相似结果的示例代码

#!/usr/bin/python
# -*- coding: utf-8 -*-
from PIL import Image, ImageDraw, ImageFont, ImageFilter
import cv2
import numpy as np
import sys
import os
reload(sys)
sys.setdefaultencoding('utf8')
#file has only one line with text "Überblick"
file1 = open("write.txt","w+")
file1.write("Überblick")
file1.close()
file2 = open("write.txt","r")
content = file2.readlines()
file2.close()
img = np.zeros((300,300,1), np.uint8)
cv2.imwrite("stack.png",img)
im = Image.open("stack.png")
d = ImageDraw.Draw(im)
helvetica = ImageFont.truetype("arialunicodems.ttf",50)
d.text((0,100), content[0].encode('utf-8'), font=helvetica, fill="white")
im.save("processed.png")
os.remove("stack.png")

有关输出，请参见processing.png。 arialunicodems.ttf file

Answer 1

所以，我自己弄清楚了。谁有使用Python 2.x和PIL在图像上编写unicode文本的问题，请先阅读this link。这对于使用不同版本的python进行文本编码非常有用。答案是使用unicode（）。删除.encode（'utf-8'）并使它像这样：

d.text((0,100), unicode(content[0]), font=helvetica, fill="white")

unicode（）将任何字符串转换为unicode字符串，类似于str（）转换为字符串。希望这可以帮助需要帮助的人。

使用python和PIL在图像上编写德语文本时出现问题

1 个答案: