我的python脚本现在正在运行,但我遇到了一些麻烦:
这是输出:
from BeautifulSoup import BeautifulSoup
import urllib
langCode={
"arabic":"ar", "bulgarian":"bg", "chinese":"zh-CN",
"croatian":"hr", "czech":"cs", "danish":"da", "dutch":"nl",
"english":"en", "finnish":"fi", "french":"fr", "german":"de",
"greek":"el", "hindi":"hi", "italian":"it", "japanese":"ja",
"korean":"ko", "norwegian":"no", "polish":"pl", "portugese":"pt",
"romanian":"ro", "russian":"ru", "spanish":"es", "swedish":"sv" }
def setUserAgent(userAgent):
urllib.FancyURLopener.version = userAgent
pass
def translate(text, fromLang, toLang):
setUserAgent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008070400 SUSE/3.0.1-0.1 Firefox/3.0.1")
try:
postParameters = urllib.urlencode({"langpair":"%s|%s" %(langCode[fromLang.lower()],langCode[toLang.lower()]), "text":text,"ie":"UTF8", "oe":"UTF8"})
except KeyError, error:
print "Currently we do not support %s" %(error.args[0])
return
page = urllib.urlopen("http://translate.google.com/translate_t", postParameters)
content = page.read()
page.close()
htmlSource = BeautifulSoup(content)
translation = htmlSource.find('span', title=text )
return translation.renderContents()
print translate("Good morning to you friend!", "English", "German")
print translate("Good morning to you friend!", "English", "Italian")
print translate("Good morning to you friend!", "English", "Spanish")
Guten Morgen, du Freund!
Buongiorno a te amico!
Buenos dÃas a ti amigo!
如何管理不是基本英文字母的字母?你会怎么建议我解决这个问题?我在想一本字典用另一个字符替换某些链,但我确信Python已经有了这样的东西。包括电池和诸如此类的东西。 :P
感谢。
答案 0 :(得分:1)
请勿解析http://translate.google.com/translate_t
,因为Google会为此目的提供AJAX服务。 translatedText
返回的json
数据中的ajax.googleapis.com
已经是unicode字符串。
import urllib2
import urllib
import sys
import json
LANG={
"arabic":"ar", "bulgarian":"bg", "chinese":"zh-CN",
"croatian":"hr", "czech":"cs", "danish":"da", "dutch":"nl",
"english":"en", "finnish":"fi", "french":"fr", "german":"de",
"greek":"el", "hindi":"hi", "italian":"it", "japanese":"ja",
"korean":"ko", "norwegian":"no", "polish":"pl", "portugese":"pt",
"romanian":"ro", "russian":"ru", "spanish":"es", "swedish":"sv" }
def translate(text,lang1,lang2):
base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
langpair='%s|%s'%(LANG.get(lang1.lower(),lang1),
LANG.get(lang2.lower(),lang2))
params=urllib.urlencode( (('v',1.0),
('q',text.encode('utf-8')),
('langpair',langpair),) )
url=base_url+params
content=urllib2.urlopen(url).read()
try: trans_dict=json.loads(content)
except AttributeError:
try: trans_dict=json.load(content)
except AttributeError: trans_dict=json.read(content)
return trans_dict['responseData']['translatedText']
print translate("Good morning to you friend!", "English", "German")
print translate("Good morning to you friend!", "English", "Italian")
print translate("Good morning to you friend!", "English", "Spanish")
产量
Guten Morgen, du Freund!
Buongiorno a te amico!
Buenos días a ti amigo!
答案 1 :(得分:0)
从urlopen()
返回的标头中解析正确的字符集,并将其作为fromEncoding
参数传递给BeautifulSoup
构造函数。