使用NLTK中的babelize_shell()进行机器翻译

时间:2012-09-04 16:17:29

标签: python nltk

您好我正在使用NLTK学习自然语言处理。我正在尝试实现本书的babelize_shell()示例。我正在做的是执行babelize_shell(),之后我输入我的字符串,接着是书中所说的德语,然后运行。

我得到的错误是:

Traceback (most recent call last):
  File "<pyshell#148>", line 1, in <module>
    babelize_shell()
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 175, in babelize_shell
    for count, new_phrase in enumerate(babelize(phrase, 'english', language)):
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 126, in babelize
    phrase = translate(phrase, next, flip[next])
  File "C:\Python27\lib\site-packages\nltk\misc\babelfish.py", line 106, in translate
    if not match: raise BabelfishChangedError("Can't recognize translated string.")
BabelfishChangedError: Can't recognize translated string.

这是一个示例会话:

>>> babelize_shell()
NLTK Babelizer: type 'help' for a list of commands.
Babel> how long before the next flight to Alice Springs?
Babel> german
Babel> run
0> how long before the next flight to Alice Springs?
1> wie lang vor dem folgenden Flug zu Alice Springs?
2> how long before the following flight to Alice jump?
3> wie lang vor dem folgenden Flug zu Alice springen Sie?
4> how long before the following flight to Alice do you jump?
5> wie lang, bevor der folgende Flug zu Alice tun, Sie springen?
6> how long, before the following flight to Alice does, do you jump?
7> wie lang bevor der folgende Flug zu Alice tut, tun Sie springen?
8> how long before the following flight to Alice does, do you jump?
9> wie lang, bevor der folgende Flug zu Alice tut, tun Sie springen?
10> how long, before the following flight does to Alice, do do you jump?
11> wie lang bevor der folgende Flug zu Alice tut, Sie tun Sprung?
12> how long before the following flight does leap to Alice, does you?

1 个答案:

答案 0 :(得分:7)

我现在遇到同样的问题。

我发现了这个: http://nltk.googlecode.com/svn/trunk/doc/api/nltk.misc.babelfish-module.html

它说: BabelfishChangedError 当babelfish.yahoo.com更改其HTML布局的某些细节时,抛出该行为,并且babelizer不再以正确的形式提交数据,或者无法再解析结果。

我要看看是否有办法解决这个问题。

我现在推出的解决方案使用Microsoft Translator Web服务(SOAP)。这不是一个简单的解决方案,但对代码来说很有趣。

我按照http://msdn.microsoft.com/en-us/library/hh454950中的说明操作,然后修改了nltk / misc / babelfish.py中的babelfish.py

  1. 订阅Azure Marketplace上的Microsoft Translator API
  2. 订阅Azure Marketplace上的Microsoft Translator API,我选择了免费订阅。

    1. 注册您的应用程序Azure DataMarket
    2. 要使用Azure DataMarket注册您的应用程序,请使用步骤1中的LiveID凭据访问datamarket.azure.com/developer/applications/,然后单击“注册”。记下您的客户端ID和客户端密码以供日后使用

      1. 为Python安装suds fedorahosted.org/suds /

      2. 修改babelfish.py(使用你自己的cliend_id和秘密):

      3. //要添加的导入

        from suds.client import Client
        import httplib
        import ast
        
        ...
        
        #added function
        def soaped_babelfish(TextToTranslate,codeLangFrom, codeLangTo):
        
            #Oauth credentials
            params = urllib.urlencode({'client_id': 'babelfish_soaped', 'client_secret': '1IkIG3j0ujiSMkTueCZ46iAY4fB1Nzr+rHBciHDCdxw=', 'scope': 'http://api.microsofttranslator.com', 'grant_type': 'client_credentials'})
        
        
            headers = {"Content-type": "application/x-www-form-urlencoded"}
            conn = httplib.HTTPSConnection("datamarket.accesscontrol.windows.net")
            conn.request("POST", "/v2/OAuth2-13/", params, headers)
            response = conn.getresponse()
            #print response.status, response.reason
        
            data = response.read()
        
        
            #obtain access_token
            respondeDict = ast.literal_eval(data)
            access_token = respondeDict['access_token']
            conn.close()
        
        
            #use the webservice with the accesstoken
            client = Client('http://api.microsofttranslator.com/V2/Soap.svc')
        
            result = client.service.Translate('Bearer'+' '+access_token,TextToTranslate,codeLangFrom, codeLangTo, 'text/plain','general')
        
            return result
        
        ...
        
        #modified translate method
        def translate(phrase, source, target):
            phrase = clean(phrase)
            try:
                source_code = __languages[source]
                target_code = __languages[target]
            except KeyError, lang:
                raise ValueError, "Language %s not available " % lang
        
            return clean(soaped_babelfish(phrase,source_code,target_code))
        

        这就是SOAPed版本的全部内容!有一天我会尝试一个基于网络的解决方案(类似于当前的babelfish.py,但适应了变化)