将JSON文件的一部分翻译成印地文

时间:2019-06-06 19:33:08

标签: python json google-translate

我在this link中有一个名为region_descriptions.json的JSON文件。 此文件无法在Windows的notepad ++中正确加载(因为它是一个很大的文件)。该文件已部分加载到Google chrome中。该文件是我的密集字幕任务的数据集,我需要编写一个python脚本将其中的每个“短语”翻译成印地文。

我在Power Shell中导航到json文件所在的目录,然后使用以下命令设置环境变量: >>$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\Preeti\Downloads\Compressed\region_descriptions.json"

此后,我尝试在同一目录中打开jupyter笔记本并运行代码:

import ijson
from google.cloud import translate

translate_client = translate.Client()

parser = ijson.items(open("region_descriptions.json"), "item.regions.item")

maxTranslations = 100;
for region in parser:
    translation = translate_client.translate(region["phrase"], target_language="hi")

    print(region["phrase"])
    print(translation['translatedText'])

    maxTranslations-=1
    if maxTranslations==0:
        break

但是jupyter笔记本给我一个错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-1-5fa13c6f3710> in <module>
      2 from google.cloud import translate
      3 
----> 4 translate_client = translate.Client()
      5 
      6 parser = ijson.items(open("region_descriptions.json"), "item.regions.item")

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\cloud\translate_v2\client.py in __init__(self, target_language, credentials, _http, client_info)
     75     ):
     76         self.target_language = target_language
---> 77         super(Client, self).__init__(credentials=credentials, _http=_http)
     78         self._connection = Connection(self, client_info=client_info)
     79 

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\cloud\client.py in __init__(self, credentials, _http)
    128             raise ValueError(_GOOGLE_AUTH_CREDENTIALS_HELP)
    129         if credentials is None and _http is None:
--> 130             credentials, _ = google.auth.default()
    131         self._credentials = google.auth.credentials.with_scopes_if_required(
    132             credentials, self.SCOPE

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in default(scopes, request)
    303 
    304     for checker in checkers:
--> 305         credentials, project_id = checker()
    306         if credentials is not None:
    307             credentials = with_scopes_if_required(credentials, scopes)

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in _get_explicit_environ_credentials()
    163     if explicit_file is not None:
    164         credentials, project_id = _load_credentials_from_file(
--> 165             os.environ[environment_vars.CREDENTIALS])
    166 
    167         return credentials, project_id

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in _load_credentials_from_file(filename)
    100     # The type key should indicate that the file is either a service account
    101     # credentials file or an authorized user credentials file.
--> 102     credential_type = info.get('type')
    103 
    104     if credential_type == _AUTHORIZED_USER_TYPE:

AttributeError: 'list' object has no attribute 'get'

有人可以帮我写一个python脚本来将json文件中的所有短语翻译成印地文还是可以帮助我克服错误?我强烈建议从给定的链接下载json文件,以更好地理解我指的是“短语”。

1 个答案:

答案 0 :(得分:0)

由于文件很大,因此应使用ijson

以下代码对我有用:

import ijson
from google.cloud import translate

translate_client = translate.Client()

parser = ijson.items(open("region_descriptions.json"), "item.regions.item")

maxTranslations = 100;
for region in parser:
    translation = translate_client.translate(region["phrase"], target_language="hi")

    print(region["phrase"])
    print(translation['translatedText'])

    maxTranslations-=1
    if maxTranslations==0:
        break

您应该考虑以下几点,以防上述情况对您不起作用:

  1. 不要忘记使用setup GOOGLE_APPLICATION_CREDENTIALS环境变量。
  2. 一切正常后,从break循环中删除for
  3. 如果您不了解ijson的工作原理,将会发现this tutorial很有帮助。