我有一个UTF-8编码文件,内容为泰米尔语(印度语)。我必须阅读文件的内容并制作PDF。我正在使用reportlab python模块来执行此操作。
我可以打开文件并阅读内容并将其打印到终端完美显示内容。但是,在使用reportlab将内容写入PDF时,有些字符(由两个'字符符号组合在一起,顺序在复合字符中反转。我为reportlab段落样式设置了泰米尔语字体。我错过了吗?
from reportlab.pdfbase import pdfmetrics
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import ParagraphStyle, getSampleStyleSheet
from reportlab.lib.enums import TA_JUSTIFY
from reportlab.pdfbase.ttfonts import TTFont
pdfmetrics.registerFont(TTFont('Latha', '/home/srinivas/Fonts/latha/latha.ttf'))
from os import listdir
from os.path import isdir, isfile, join
import random
import codecs
from tamil import utf8 as tamil
PATH = 'tamil_file'
num_sets = 1
pages_per_set = 12
num_articles_per_page = 2
styles = getSampleStyleSheet()
styles.add(ParagraphStyle(name='CustomPara', fontName='Mangal', fontSize=14, alignment=TA_JUSTIFY, leading=24))
style = styles['CustomPara']
styleH = styles['Heading1']
for set_idx in range(num_sets):
doc = SimpleDocTemplate(str(set_idx)+'.pdf', pagesize=A4)
story = []
for page in range(pages_per_set):
story.append(Spacer(1, 0.1* inch))
story.append(Paragraph(id, styleH))
story.append(Spacer(1, 0.1 * inch))
with codecs.open(join(PATH,selected_file),'r','utf-8') as f:
for l in f.readlines():
print l # prints correctly in terminal
lines += l
story.append(Paragraph(lines, style))
story.append(PageBreak())
doc.build(story)
实际文字:நாவல் மரத்தின் மருத்துவப் பயன்கள் போற்றத்தக்கவை
注意:如果我从PDF复制文本并将其粘贴到此处,则显示正常(错误的文字是图像附件)!