我在编码文本方面存在一些问题,在JSON中,来自使用Google Translate API的翻译,我也是Python和Google API的初学者。
下面你可以找到一个基本的脚本,它从CSV中提取结构的ID,从数据库中选择英文描述并尝试在另一个表中写下翻译的描述。
翻译部分后:
t = service.translations().list(source='%s' % trans, \
target='%s' % lang, q=[message2t]).execute()
translated = t['translations'][0]['translatedText']
我的unicode变量translated
里面有脏字符(我有像德语或法语这样的语言问题)。我不知道如何获得正确的字符。
实际上,当我尝试将字符串写入数据库时,我会收到此错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 225: ordinal not in range(128)
这是完整的基本代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from googleapiclient.discovery import build
import _mysql
import time
trans = 'en'
langs = [ 'de', 'dk', 'es', 'fr', 'it', 'nl', 'no', 'pg', 'pl', 'sw' ]
# Google API Environment
key = 'MYKEY'
service = build('translate', 'v2', developerKey=key)
# Open DB connection
db = _mysql.connect(user='MYUSER',
passwd='MYPASSWORD',
host='MYRDS',
port=3306,
db='MYDB')
for lang in langs:
print 'Finding structures w/o description in {} language'.format(lang.upper())
with open('nodesc_%s.csv' % lang, 'r') as structures:
for structure in structures:
id_str = structure.split('\t')[0]
text2t = """SELECT `text` FROM `texts` WHERE
`str_ID`='%s' AND
`type`='description' AND
`lang`='%s';""" % (id_str, trans)
db.query(text2t)
r = db.store_result()
message2t = r.fetch_row()[0][0]
# Check if there is a description for real
if len(message2t) is not 0:
t = service.translations().list(source='%s' % trans, \
target='%s' % lang, q=[message2t]).execute()
translated = t['translations'][0]['translatedText']
now = time.strftime("%Y-%m-%d %H:%M:%S")
texttranslated = """INSERT INTO `descriptions`
(`ID_desc`, `ID_str`, `text`, `lang`, `human_date`, `google_date`)
VALUES (NULL, '%s', '%s', '%s', '0000-00-00 00:00:00', '%s')""" \
% (id_str, translated, lang, now)
db.query(texttranslated)
else:
print 'Structure with id {} have no description in english'.format(id_str)
# Close DB connection
db.close()