我在this link中有一个名为region_descriptions.json的JSON文件。 此文件无法在Windows的notepad ++中正确加载(因为它是一个很大的文件)。该文件已部分加载到Google chrome中。该文件是我的密集字幕任务的数据集,我需要编写一个python脚本将其中的每个“短语”翻译成印地文。
我在Power Shell中导航到json文件所在的目录,然后使用以下命令设置环境变量:
>>$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\Preeti\Downloads\Compressed\region_descriptions.json"
此后,我尝试在同一目录中打开jupyter笔记本并运行代码:
import ijson
from google.cloud import translate
translate_client = translate.Client()
parser = ijson.items(open("region_descriptions.json"), "item.regions.item")
maxTranslations = 100;
for region in parser:
translation = translate_client.translate(region["phrase"], target_language="hi")
print(region["phrase"])
print(translation['translatedText'])
maxTranslations-=1
if maxTranslations==0:
break
但是jupyter笔记本给我一个错误:
AttributeError Traceback (most recent call last)
<ipython-input-1-5fa13c6f3710> in <module>
2 from google.cloud import translate
3
----> 4 translate_client = translate.Client()
5
6 parser = ijson.items(open("region_descriptions.json"), "item.regions.item")
c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\cloud\translate_v2\client.py in __init__(self, target_language, credentials, _http, client_info)
75 ):
76 self.target_language = target_language
---> 77 super(Client, self).__init__(credentials=credentials, _http=_http)
78 self._connection = Connection(self, client_info=client_info)
79
c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\cloud\client.py in __init__(self, credentials, _http)
128 raise ValueError(_GOOGLE_AUTH_CREDENTIALS_HELP)
129 if credentials is None and _http is None:
--> 130 credentials, _ = google.auth.default()
131 self._credentials = google.auth.credentials.with_scopes_if_required(
132 credentials, self.SCOPE
c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in default(scopes, request)
303
304 for checker in checkers:
--> 305 credentials, project_id = checker()
306 if credentials is not None:
307 credentials = with_scopes_if_required(credentials, scopes)
c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in _get_explicit_environ_credentials()
163 if explicit_file is not None:
164 credentials, project_id = _load_credentials_from_file(
--> 165 os.environ[environment_vars.CREDENTIALS])
166
167 return credentials, project_id
c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in _load_credentials_from_file(filename)
100 # The type key should indicate that the file is either a service account
101 # credentials file or an authorized user credentials file.
--> 102 credential_type = info.get('type')
103
104 if credential_type == _AUTHORIZED_USER_TYPE:
AttributeError: 'list' object has no attribute 'get'
有人可以帮我写一个python脚本来将json文件中的所有短语翻译成印地文还是可以帮助我克服错误?我强烈建议从给定的链接下载json文件,以更好地理解我指的是“短语”。
答案 0 :(得分:0)
由于文件很大,因此应使用ijson。
以下代码对我有用:
import ijson
from google.cloud import translate
translate_client = translate.Client()
parser = ijson.items(open("region_descriptions.json"), "item.regions.item")
maxTranslations = 100;
for region in parser:
translation = translate_client.translate(region["phrase"], target_language="hi")
print(region["phrase"])
print(translation['translatedText'])
maxTranslations-=1
if maxTranslations==0:
break
您应该考虑以下几点,以防上述情况对您不起作用:
break
循环中删除for
。ijson
的工作原理,将会发现this tutorial很有帮助。