我想在gcp项目A上运行数据流批处理作业。管道的源是来自另一个项目的数据存储。 Pipeline与DirectPipelineRunner一起使用,但是当我切换到DataflowPipelineRunner时,我收到错误:请求失败,代码为403,不会重试:https://www.googleapis.com/datastore/v1beta2/datasets/projectb/runQuery。这样做的正确方法是什么?
我将服务帐户从项目A添加到项目B.还提供了来自服务帐户证书的选项凭据。
管道代码:
public class Sample {
public static void main(String[] args) throws Exception {
DataflowPipelineOptions options = PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);
//options.setRunner(DirectPipelineRunner.class);
options.setRunner(DataflowPipelineRunner.class);
options.setProject("project_a");
// Your Google Cloud Storage path for staging local files.
options.setStagingLocation("gs://project_a_folder/staging");
options.setGcpCredential(
DatastoreHelper.getServiceAccountCredential(
"project_a@developer.gserviceaccount.com",
SecurityUtils.loadPrivateKeyFromKeyStore(
SecurityUtils.getPkcs12KeyStore(),
Sample.class.getClass().getResourceAsStream(
"/projecta-0450c49cbddc.p12"),
"notasecret",
"privatekey",
"notasecret"),
Arrays.asList(
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/datastore")));
Pipeline pipeline = Pipeline.create(options);
DatastoreV1.Query.Builder q = DatastoreV1.Query.newBuilder();
q.addKindBuilder().setName("Entity");
q.setFilter(DatastoreHelper.makeFilter("property",
DatastoreV1.PropertyFilter.Operator.EQUAL,
DatastoreHelper.makeValue("somevalue")));
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("f1").setType("STRING"));
fields.add(new TableFieldSchema().setName("f2").setType("STRING"));
TableSchema tableSchema = new TableSchema().setFields(fields);
pipeline.apply(DatastoreIO.readFrom("projectb", q.build()))
.apply(ParDo.of(new DoFn<DatastoreV1.Entity, KV<String, String>>() {
@Override
public void processElement(ProcessContext c) throws Exception {
try {
Map<String, DatastoreV1.Value> propertyMap = DatastoreHelper.getPropertyMap(c.element());
String p1 = DatastoreHelper.getString(propertyMap.get("p1"));
String p2 = DatastoreHelper.getString(propertyMap.get("p2"));
if (!Strings.isNullOrEmpty(p1)) {
c.output(KV.of(p2, p1));
}
} catch (Exception e) {
log.log(Level.SEVERE, "Failed to output entity data", e);
}
}
}))
.apply(ParDo.of(new DoFn<KV<String, String>, TableRow>() {
@Override
public void processElement(ProcessContext c) throws Exception {
TableRow tableRow = new TableRow();
tableRow.set("f1", c.element().getKey());
tableRow.set("f2", c.element().getValue());
c.output(tableRow);
}
}))
.apply(BigQueryIO.Write.to("dataset.table")
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withSchema(tableSchema));
pipeline.run();
}
}
答案 0 :(得分:1)
如果项目A的服务帐户被添加为项目B的管理员,并且两个项目都启用了Cloud Datastore API,那么从Dataflow跨项目数据存储访问应该有效。
我认为您不需要执行任何手动凭据,Dataflow应自动作为项目A的服务帐户运行。