我们使用python googleapiclient
API在AI平台上创建工作。
from oauth2client.client import GoogleCredentials
import datetime
credentials = GoogleCredentials.get_application_default()
training_inputs = {'scaleTier':'CUSTOM','masterType':'complex_model_m',
'packageUris':['package_bucket_file_path'],
'pythonModule':'randomforest_trainer_RUL.train',
'args':[
'--trainFilePath', data[0],
'--trainOutputPath', data[2],
'--testFilePath', data[1],
'--testOutputPath', data[3],
'--target', target_label,
'--bucket', BUCKET,
'--expid', experiment_id
],
'region': "region_of_bucket",
'runtimeVersion':'1.14',
'pythonVersion':'3.5'}
timestamp = datetime.datetime.now().strftime('%y%m%d_%H%M%S%f')
job_name = "job_"+experiment_id
## logging information
logging.info("Job Name:{}".format(job_name))
##
api = discovery.build('ml', 'v1', credentials=credentials,cache_discovery=False)
project_id = 'projects/{}'.format(PROJECT)
credentials = GoogleCredentials.get_application_default()
request = api.projects().jobs().create(body=job_spec, parent=project_id)
它正在工作,我能够训练模型,进行测试和预测直到昨天。 但是突然之间我无法在AI Platform中训练模型,而我得到的错误是
The replica master 0 exited with a non-zero status of 1. \nTraceback (most recent call last):\n [...]\n
File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 810, in ls\n
combined_listing = self._ls(path, detail) + self._ls(path + "/", detail)\n
File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-12>", line 2, in _ls\n
File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py", line 50, in _tracemethod\n
return f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 820, in _ls\n listing = self._list_objects(path)\n File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-5>",
line 2, in _list_objects\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 50, in _tracemethod\nreturn f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 616, in _list_objects\n listing = self._do_list_objects(path)\n File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-6>",
line 2, in _do_list_objects\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 50, in _tracemethod\n return f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 637, in _do_list_objects\n maxResults=max_results,\n File "</root/.local/lib/python3.5/site-packages/decorator.py:decorator-gen-2>",
line 2, in _call\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 50, in _tracemethod\n return f(self, *args, **kwargs)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 517, in _call\n validate_response(r, path)\n File "/root/.local/lib/python3.5/site-packages/gcsfs/core.py",
line 171, in validate_response\n raise IOError("Forbidden: %s\\n%s" % (path, msg))\nOSError:
Forbidden: https://www.googleapis.com/storage/v1/b/some-storage-bucket/o/\nservice-87XX90XX1XX@cloud-ml.google.com.iam.gserviceaccount.com
does not have serviceusage.services.use access to project 34XX12XX12X.\n\nTo find out more about why your job exited
please check the logs: https://console.cloud.google.com/logs/viewer?project=87XX90XX1XX&resource=ml_job%2Fjob_id%2Fjob_5de3592da3c3c541d73389er&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22job_5de3592da3c3c541d73389erce%22
我得到的错误是
service-87XX90XX1XX@cloud-ml.google.com.iam.gserviceaccount.com
does not have serviceusage.services.use access to project 34XX12XX12X
答案 0 :(得分:2)
今天有确切的问题。正如尼克所说,这是GCSFS的新发行版问题。建议您不要使用Tensorflow GFile函数直接从存储桶中读取CSV文件,而不要使用pd.read_csv(gcs_path)
。
with tf.gfile.GFile(gcs_path) as f:
if(opts):
df = pd.read_csv(f, opts)
else:
df = pd.read_csv(f)
return df
这将使您可以不中断地运行作业。