如何使我的程序完全与Unicode(或接近它)兼容?

时间:2015-01-28 04:46:44

标签: python unicode utf-8 reddit pillow

我写了一个Reddit机器人,它会查找在移动设备上不能很好地显示的帖子,并将它们转换为图片。我遇到的一个问题是它没有很好地处理Unicode符号: http://www.reddit.com/r/pics/comments/2lgf08/mom_i_thought_you_were_taking_me_to_see_harry/clumkde http://www.reddit.com/r/mobilewizard/comments/2j62ix/html_entity_test_part_ii/cl8plni

正如您所看到的,我可以使基本的HTML实体工作(因为我使用HTMLParser将这些实体编码为utf-8),但更多花哨的符号并不适用。这是Python Imaging Library的限制,还是我能做些什么?我认为转换为utf-8就足够了。如果重要,我使用的字体是Courier New。

所有代码都在这里:

from PIL import Image, ImageDraw, ImageFont
from cStringIO import StringIO

import HTMLParser

def str_to_img(str):
"""Converts a given string to a PNG image, and saves it to the return variable"""
# use 12pt Courier New for ASCII art
font = ImageFont.truetype("cour.ttf", 12)

# do some string preprocessing
str = str.replace("\n\n", "\n") # Reddit requires double newline for new line, don't let the bot do this
h = HTMLParser.HTMLParser()
str = h.unescape(str).encode('utf-8') # convert HTML entities to plain text

# create a placeholder image to determine correct image
img = Image.new('RGB', (1,1))
d = ImageDraw.Draw(img)

str_by_line = str.split("\n")
num_of_lines = len(str_by_line)

line_widths = []
for i, line in enumerate(str_by_line):
    line_widths.append(d.textsize(str_by_line[i], font=font)[0])
line_height = d.textsize(str, font=font)[1]     # the height of a line of text should be unchanging

img_width = max(line_widths)                                    # the image width is the largest of the individual line widths
img_height = num_of_lines * line_height             # the image height is the # of lines * line height

# creating the output image
# add 5 pixels to account for lowercase letters that might otherwise get truncated
img = Image.new('RGB', (img_width, img_height + 5), 'white')
d = ImageDraw.Draw(img)

for i, line in enumerate(str_by_line):
    d.text((0,i*line_height), line, font=font, fill='black')
output = StringIO()
img.save(output, format='PNG')

return output

0 个答案:

没有答案