按照this指南在Google Cloud Dataproc群集上设置jupyter笔记本时,我收到以下错误:
gcloud dataproc clusters create my-name \
--project my-project-id \
--bucket my-bucket-name
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh
(gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Multiple validation errors:
- Insufficient 'CPUS' quota. Requested 12.0, available 8.0.
- This request exceeds CPU quota. Some things to try: request fewer workers (a minimum of 2 is required), use smaller master and/or worker machine types (such as n1-standard-2).
我处于自由trais期间并且仅限于8-cpus。如何更改机器类型?你会推荐什么设置?
答案 0 :(得分:2)
默认情况下,最少2个工作者(由于HDFS复制要求)加上主节点,默认机器类型为n1-standard-4。由于您只有8个可用内核,因此需要:
gcloud dataproc clusters create my-name \
--project my-project-id \
--bucket my-bucket-name \
--master-machine-type n1-standard-2 \
--worker-machine-type n1-standard-2 \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh
答案 1 :(得分:1)
如果您关注“GOOGLE CLOUD BIG DATA AND MACHINE LEARNING BLOG”(https://cloud.google.com/blog/big-data/2017/02/google-cloud-platform-for-data-scientists-using-jupyter-notebooks-with-apache-spark-on-google-cloud),您必须对@Dennis Huo的解决方案进行一些修改,
gcloud dataproc clusters create datascience \
--master-machine-type n1-standard-2 \
--worker-machine-type n1-standard-2 \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh
因为在设置项目和存储桶时会出错。
注意:
错误:(gcloud.dataproc.clusters.create)PERMISSION_DENIED:不允许获取项目my-project-id的项目设置
错误:(gcloud.dataproc.clusters.create)INVALID_ARGUMENT:Google云端存储分区的访问被拒绝:'my-bucket-name')
答案 2 :(得分:0)
您可以传递项目信息。举个例子:
gcloud dataproc clusters create $CLUSTERNAME \
--project $PROJECT \
--num-workers $WORKERS \
--bucket $BUCKET \
--master-machine-type $VMMASTER \
--worker-machine-type $VMWORKER \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--scopes cloud-platform