没有storage.objects.get访问权限

时间:2019-01-14 08:46:14

标签: hadoop google-cloud-platform google-cloud-storage google-cloud-dataproc

将作业提交到Dataproc时,我无法解决GCS存储桶权限问题。

这是我在做什么:

  1. 创建了一个项目
  2. 创建了一个存储桶xmitya-test
  3. 创建集群:
gcloud dataproc clusters create cascade --bucket=xmitya-test \
    --master-boot-disk-size=80G --master-boot-disk-type=pd-standard \
    --num-master-local-ssds=0 --num-masters=1 \
    --num-workers=2 --num-worker-local-ssds=0 \
    --worker-boot-disk-size=80G --worker-boot-disk-type=pd-standard \
    --master-machine-type=n1-standard-2 \
    --worker-machine-type=n1-standard-2 \
    --zone=us-west1-a --image-version=1.3 \
    --properties 'hadoop-env:HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*'
  1. 上载的作业jar:/apps/wordcount.jar和库/apps/lib/commons-collections-3.2.2.jar
  2. 然后在类路径中提交一个带有jar的作业:
gcloud dataproc jobs submit hadoop --cluster=cascade \
    --jar=gs:/apps/wordcount.jar \
    --jars=gs://apps/lib/commons-collections-3.2.2.jar --bucket=xmitya-test \
    -- gs:/input/url+page.200.txt gs:/output/wc.out local

然后,我无法访问库文件了:

java.io.IOException: Error accessing: bucket: apps, object: lib/commons-collections-3.2.2.jar
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.wrapException(GoogleCloudStorageImpl.java:1957)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1983)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1870)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1156)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1058)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2375)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2344)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.copyToLocalFile(GoogleHadoopFileSystemBase.java:1793)
    at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2320)
    at com.google.cloud.hadoop.services.agent.util.HadoopUtil.download(HadoopUtil.java:70)
    at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler.downloadResources(AbstractJobHandler.java:448)
    at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler$StartDriver.call(AbstractJobHandler.java:579)
    at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler$StartDriver.call(AbstractJobHandler.java:568)
    at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "714526773712-compute@developer.gserviceaccount.com does not have storage.objects.get access to apps/lib/commons-collections-3.2.2.jar.",
    "reason" : "forbidden"
  } ],
  "message" : "714526773712-compute@developer.gserviceaccount.com does not have storage.objects.get access to apps/lib/commons-collections-3.2.2.jar."
}
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:401)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1978)
    ... 23 more

尝试将浏览器的读取权限设置为714526773712-compute@developer.gserviceaccount.com用户,并对所有文件设置公共权限:gsutil defacl ch -u AllUsers:R gs://xmitya-testgsutil acl ch -d allUsers:R gs://xmitya-test/**-无效。

可能是什么原因? 谢谢!

1 个答案:

答案 0 :(得分:1)

它抱怨访问在作业提交命令的参数中指定的appsinputoutput存储桶:

  

gcloud dataproc作业提交hadoop --cluster = cascade --jar = gs:/ apps /wordcount.jar --jars = gs:// apps / lib /commons-collections-3.2.2.jar --bucket = xmitya-test gs:/ input /url+page.200.txt gs:/ output / wc。本地

要解决此问题,您需要授予对这些存储桶的访问权限,或者如果这些存储桶位于xmitya-test存储桶中,则需要在路径gs://xmitya-test/apps/wordcount.jar中明确指定它。