更新:解决:我的pom依赖于
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>0.6.0</version>
</dependency>
它本身引用了com.fasterxml.jackson包,这是问题的根源。在将依赖项逐步添加到初始项目后得到此信息。我现在对这些依赖项进行着色:
<relocations>
<relocation>
<pattern>com.google.protobuf</pattern>
<shadedPattern>com.shaded_storage.google.protobuf</shadedPattern>
</relocation>
<relocation>
<pattern>com.fasterxml</pattern>
<shadedPattern>com.fasterxml.shaded</shadedPattern>
</relocation>
<relocation>
<pattern>org.codehaus.jackson</pattern>
<shadedPattern>org.codehaus.jackson.shaded</shadedPattern>
</relocation>
</relocations>
在我的初始项目中似乎有效。现在将对主项目进行测试,并希望这有效。
不得不调整几个deps,现在它似乎有效。当项目再次在Google App Engine上运行时,希望情况仍然如此。
卡住管道的初始问题已启动但尚未运行:
今天我将我的Dataflow-Project从Googles Version 1.9.0更新为Apache Beam 2.1.0。管道正在Google Cloud上运行。
当我事先搞砸了一些数据时,我想运行一个非常小的样本管道来清理它,这将从Google数据存储区加载343行,将实体转换为Pojo并将它们写回Google Datastore(通过密钥和名为ID的属性将被更新)。
我在20分钟前启动了Pipeline,它仍然没有从数据存储区加载单个实体。好像它被卡住了。我将杀死约伯并第二次启动,但我想知道问题是什么。
The JobId:2017-09-19_10_08_32-9486491048477927168
管道:
来自控制台的摘要等:
代码:
public static void main(String...args) {
Pipeline pipe = PipelineCreation.getN1Standard1("updatekeys");
pipe.apply("ReadArticles", DatastoreIO.v1().read().withNamespace("XXXXX").withProjectId(C.PROJECT_ID).withQuery(DatastoreUtil.getPropertyEqualsQuery("article", "catalogKey", "XXXX")))
.apply("ToPojo", ParDo.of(new ArticleFromEntity()))
.apply("ToEntity", ParDo.of(new ArticleToEntity("XXXXX", "article")))
.apply(DatastoreIO.v1().write().withProjectId(C.PROJECT_ID));
pipe.run();
}
public static Pipeline getN1Standard1(String jobName){
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setProject("bmt-katalogintegration");
options.setRunner(DataflowRunner.class);
options.setStagingLocation("gs://katalog-df-staging/binaries");
options.setTempLocation("gs://katalog-df-staging/binaries/tmp");
options.setGcpTempLocation("gs://katalog-df-staging/binaries/tmp");
options.setZone("europe-west3-a");
options.setJobName(jobName);
options.setMaxNumWorkers(6);
options.setWorkerMachineType("n1-standard-1");
return Pipeline.create(options);
}
public static Query getPropertyEqualsQuery(String kind, String property, String value){
Query query = Query.newBuilder()
.addKind(KindExpression.newBuilder().setName(kind).build())
.setFilter(Filter.newBuilder()
.setPropertyFilter(PropertyFilter.newBuilder()
.setProperty(PropertyReference.newBuilder()
.setName(property))
.setOp(PropertyFilter.Operator.EQUAL)
.setValue(Value.newBuilder().setStringValue(value).build())
.build())
.build())
.build();
return query;
}
反馈将受到高度赞赏。
编辑:第二份工作也不起作用。 JobId:2017-09-19_10_46_05-8997694731596581186
编辑:Stackdriver的错误:
E at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness.main(DataflowBatchWorkerHarness.java:56)
E at com.google.cloud.dataflow.worker.DataflowWorkerHarnessHelper.initializeGlobalStateAndPipelineOptions(DataflowWorkerHarnessHelper.java:39)
E at com.google.cloud.dataflow.worker.WorkerPipelineOptionsFactory.createFromSystemProperties(WorkerPipelineOptionsFactory.java:48)
E at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2858)
E at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3814)
E at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserialize(AbstractDeserializer.java:216)
E at com.fasterxml.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1012)
E com.fasterxml.jackson.databind.JsonMappingException: Can not construct instance of org.apache.beam.sdk.options.PipelineOptions: abstract types either need to be mapped to concrete types, have custom deserializer, or contain additional type information
E at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:1456)
E at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:270)
E at [Source: {"display_data":[{"key":"jobName","namespace":"org.apache.beam.sdk.options.PipelineOptions","type":"STRING","value":"updatekeys4"},{"key":"tempLocation","namespace":"org.apache.beam.sdk.options.PipelineOptions","type":"STRING","value":"gs://katalog-df-staging/binaries/tmp"},{"key":"zone","namespace":"org.apache.beam.sdk.extensions.gcp.options.GcpOptions","type":"STRING","value":"europe-west3-a"},{"key":"appName","namespace":"org.apache.beam.sdk.options.ApplicationNameOptions","type":"STRING","value":"PipelineCreation"},{"key":"stagingLocation","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineOptions","type":"STRING","value":"gs://katalog-df-staging/binaries"},{"key":"zone","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions","type":"STRING","value":"europe-west3-a"},{"key":"maxNumWorkers","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions","type":"INTEGER","value":6},{"key":"project","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineOptions","type":"STRING","value":"bmt-katalogintegration"},{"key":"workerMachineType","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions","type":"STRING","value":"n1-standard-1"},{"key":"runner","namespace":"org.apache.beam.sdk.options.PipelineOptions","shortValue":"DataflowRunner","type":"JAVA_CLASS","value":"org.apache.beam.runners.dataflow.DataflowRunner"},{"key":"gcpTempLocation","namespace":"org.apache.beam.sdk.extensions.gcp.options.GcpOptions","type":"STRING","value":"gs://katalog-df-staging/binaries/tmp"}],"options":{"apiRootUrl":"https://dataflow.googleapis.com/","appName":"PipelineCreation","autoscalingAlgorithm":"NONE","credentialFactoryClass":"org.apache.beam.sdk.extensions.gcp.auth.GcpCredentialFactory","dataflowEndpoint":"","dataflowJobId":"2017-09-20_04_31_04-12049147039561540392","enableCloudDebugger":false,"gcpTempLocation":"gs://katalog-df-staging/binaries/tmp","jobName":"updatekeys4","maxNumWorkers":6,"numWorkers":3,"numberOfWorkerHarnessThreads":0,"pathValidatorClass":"org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator","project":"bmt-katalogintegration","runner":"org.apache.beam.runners.dataflow.DataflowRunner","stableUniqueNames":"WARNING","stagerClass":"org.apache.beam.runners.dataflow.util.GcsStager","stagingLocation":"gs://katalog-df-staging/binaries","streaming":false,"tempLocation":"gs://katalog-df-staging/binaries/tmp","workerMachineType":"n1-standard-1","zone":"europe-west3-a"}}; line: 1, column: 1]
E Uncaught exception in main thread. Exiting with status code 1.
W Please use a logger instead of System.out or System.err.
Please switch to using org.slf4j.Logger.
See: https://cloud.google.com/dataflow/pipelines/logging
E Uncaught exception in main thread. Exiting with status code 1.