是否有任何Python包允许解析.po文件(消息,包括上下文注释)?

时间:2013-03-24 16:23:24

标签: python internationalization translation globalization

我需要合并/更新/删除.po文件消息,需要一些Python包,它允许我完全解析.po文件,包括:消息,复数,位置,上下文和注释。

我想做一个简单的工具来检查文件之间的差异。我也可以使用一些GUI已经完成但不确定是否有这样的工具可以添加新的翻译或删除未使用的翻译。

我正在搜索一些文章,但没有找到如何做到这一点。请推荐一些Python包,它完全解析.po(可能是其他语言)或工具来完成这样重要的任务,以保持良好的翻译。

3 个答案:

答案 0 :(得分:4)

polib包非常好。它解析文件并提供了几种访问数据的方法,包括迭代器来循环访问msgid,msgstr对来做你需要的任何事情。这是Quick Start documentation

如果.po不可用,它还可以解析.mo,专门处理过时的消息字符串,只迭代翻译的字符串,以及其他不错的功能。

答案 1 :(得分:2)

尝试babel模块。它包含.pobabel.messages.catalogbabel.messages.pofile解析器。

答案 2 :(得分:1)

您不需要花哨的工具来阅读.po文件;它们是纯文本文件,基本上包含消息/翻译对:

#: buttons.c:425
msgid "Extra"
msgstr "Thêm"

#: buttons.c:433
msgid "Help"
msgstr "Trợ giúp"

对于一个比较它们的简单工具,我建议使用diff -u

存在.mo扩展名的二进制格式。您可以使用gettext-tools包中的msgunfmt程序将它们转换回纯文本。

.po文件中提取ID /翻译对并不困难:

In [1]: po = '''#: buttons.c:425
   ...: msgid "Extra"
   ...: msgstr "Thêm"
   ...: 
   ...: #: buttons.c:433
   ...: msgid "Help"
   ...: msgstr "Trợ giúp"
   ...: 
   ...: '''

In [2]: import re

In [3]: re.findall('^msgid \"(.*)\"', po, re.MULTILINE)
Out[3]: ['Extra', 'Help']

In [4]: re.findall('^msgstr \"(.*)\"', po, re.MULTILINE)
Out[4]: ['Th\xc3\xaam', 'Tr\xe1\xbb\xa3 gi\xc3\xbap']

In [5]: zip(re.findall('^msgid \"(.*)\"[^\"]*', po, re.MULTILINE), re.findall('^msgstr \"(.*)\"[^\"]*', po, re.MULTILINE))
Out[5]: [('Extra', 'Th\xc3\xaam'), ('Help', 'Tr\xe1\xbb\xa3 gi\xc3\xbap')]

我正在使用^re.MULTILINE来阻止注释掉的消息显示在此处。作为完整性检查,请确保包含message-id和消息字符串的列表具有相同的长度。

修改:您有关于随机或退出并使用diff的有效点。但是您可以使用上面的代码为旧的和.po文件创建(message-id,translation)元组列表。如果您按照消息ID对这些列表进行排序,则可以使用difflib.unified_diff来打印差异。

例如:

In [1]: import re, itertools, difflib

#I've used cpaste to input two pieces of a .po file, the latter with some changes

In [4]: orig_po
Out[4]: '#: mixedgauge.c:64\nmsgid "Passed"\nmsgstr "\xc4\x90\xe1\xbb\x97"\n\n#: mixedgauge.c:67\nmsgid "Completed"\nmsgstr "Ho\xc3\xa0n to\xc3\xa0n"\n\n#: mixedgauge.c:70\nmsgid "Checked"\nmsgstr "\xc4\x90\xc3\xa3 ki\xe1\xbb\x83m tra"\n\n#: mixedgauge.c:73\nmsgid "Done"\nmsgstr "Ho\xc3\xa0n t\xe1\xba\xa5t"\n\n#: mixedgauge.c:76\nmsgid "Skipped"\nmsgstr "B\xe1\xbb\x8b b\xe1\xbb\x8f qua"\n\n#: mixedgauge.c:79\nmsgid "In Progress"\nmsgstr "\xc4\x90ang ch\xe1\xba\xa1y"\n\n#: mixedgauge.c:85\nmsgid "N/A"\nmsgstr "Kh\xc3\xb4ng c\xc3\xb3"\n\n#: mixedgauge.c:193\nmsgid "Overall Progress"\nmsgstr "To\xc3\xa0n ti\xe1\xba\xbfn h\xc3\xa0nh"\n'

In [5]: changed_po
Out[5]: '#: mixedgauge.c:64\nmsgid "Passed"\nmsgstr "\xc4\x90\xe1\xbb\x97"\n\n#: mixedgauge.c:193\nmsgid "Overall Progres"\nmsgstr "To\xc3\xa0n ti\xe1\xba\xbfn h\xc3\xa0nh"\n\n#: mixedgauge.c:67\nmsgid "Completed"\nmsgstr "Ho\xc3\xa0na to\xc3\xa0n"\n\n#: mixedgauge.c:76\nmsgid "Skipped"\nmsgstr "B\xe1\xbb\x8b b\xe1\xbb\x8f qua"\n\n#: mixedgauge.c:79\nmsgid "In Progress"\nmsgstr "\xc4\x90ang ch\xe1\xba\xa1y"\n\n#: mixedgauge.c:85\nmsgid "N/A"\nmsgstr "Kh\xc3\xb4ng c\xc3\xb3e"\n\n#: mixedgauge.c:70\nmsgid "Checked"\nmsgstr "\xc4\x90\xc3\xa3 ki\xe1\xbb\x83m tra"\n\n#: mixedgauge.c:73\nmsgid "Done"\nmsgstr "Ho\xc3\xa0n t\xe1\xba\xa5t"\n'

# Making a list of tuples

In [6]: orig_list = zip(re.findall('^(msgid \".*\")', orig_po, re.MULTILINE), re.findall('^(msgstr \".*\")', orig_po, re.MULTILINE))

In [7]: changed_list = zip(re.findall('^(msgid \".*\")', changed_po, re.MULTILINE), re.findall('^(msgstr \".*\")', changed_po, re.MULTILINE))

# Sort them by the message-id

In [8]: orig_list.sort(key=lambda t: t[0])

In [9]: changed_list.sort(key=lambda t: t[0])

# Now flatten the list

In [10]: orig_string_list = [i for i in itertools.chain(*orig_list)]

In [11]: changed_string_list = [i for i in itertools.chain(*changed_list)]

In [12]: orig_list[0:3]
Out[12]: [('msgid "Checked"', 'msgstr "\xc4\x90\xc3\xa3 ki\xe1\xbb\x83m tra"'), ('msgid "Completed"', 'msgstr "Ho\xc3\xa0n to\xc3\xa0n"'), ('msgid "Done"', 'msgstr "Ho\xc3\xa0n t\xe1\xba\xa5t"')]

In [13]: orig_string_list[0:6]
Out[13]: ['msgid "Checked"', 'msgstr "\xc4\x90\xc3\xa3 ki\xe1\xbb\x83m tra"', 'msgid "Completed"', 'msgstr "Ho\xc3\xa0n to\xc3\xa0n"', 'msgid "Done"', 'msgstr "Ho\xc3\xa0n t\xe1\xba\xa5t"']

# print the diff

In [14]: for l in difflib.unified_diff(orig_string_list, changed_string_list, fromfile='original', tofile='changed'):
   ....:     print l
   ....:     
--- original

+++ changed

@@ -1,14 +1,14 @@

 msgid "Checked"
 msgstr "Đã kiểm tra"
 msgid "Completed"
-msgstr "Hoàn toàn"
+msgstr "Hoàna toàn"
 msgid "Done"
 msgstr "Hoàn tất"
 msgid "In Progress"
 msgstr "Đang chạy"
 msgid "N/A"
-msgstr "Không có"
-msgid "Overall Progress"
+msgstr "Không cóe"
+msgid "Overall Progres"
 msgstr "Toàn tiến hành"
 msgid "Passed"
 msgstr "Đỗ"