我在尝试使用 spark bigquery 连接器写入 BigQuery 时遇到此错误。该应用程序从 hadoop 集群(不是 dataproc)运行。
java.io.IOException:从元数据服务器获取访问令牌时出错:http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token 在 com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:236) 在 com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:91) 在 com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1533) 在 com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1554) 在 com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:654) 在 com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:617) 在 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) 在 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) 在 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) 在 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) 在 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) 在 org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) 在 com.google.cloud.spark.bigquery.BigQueryWriteHelper.(BigQueryWriteHelper.scala:62) 在 com.google.cloud.spark.bigquery.BigQueryInsertableRelation.insert(BigQueryInsertableRelation.scala:42) 在 com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:112) 在 org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) 在 org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) 在 org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) 在 org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) 在 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) 在 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) 在 org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) 在 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 在 org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 在 org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) 在 org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) 在 org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 在 org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:664) 在 org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:664) 在 org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) 在 org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:664) 在 org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) 在 org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) 在 org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
这是代码,
dataset.write().format("bigquery")
.option("temporaryGcsBucket", tempGcsBucket)
//.option("table", databaseName + "." + tableName)
.option("project", projectId)
.option("parentProject", parentProjectId)
.option("credentials", credentials)
.mode(saveMode).save(projectId + "." + databaseName + "." + tableName);
我能够使用相同的凭据(服务帐户 base 64 编码)从我尝试写入的同一个表中读取。我正在使用 spark-bigquery-with-dependencies_2.11-0.19.1.jar 版本的连接器。
相同的代码在项目和父项目相同的较低环境中运行良好。但在生产过程中,它们是不同的。