从1.9.0升级到2.1.0后,Google Cloud上的Apache Beam停止/未运行

时间:2017-09-19 17:39:06

标签: google-cloud-datastore google-cloud-platform google-cloud-dataflow apache-beam

更新:解决:我的pom依赖于

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-storage</artifactId>
    <version>0.6.0</version>
</dependency>

它本身引用了com.fasterxml.jackson包,这是问题的根源。在将依赖项逐步添加到初始项目后得到此信息。我现在对这些依赖项进行着色:

<relocations>
    <relocation>
        <pattern>com.google.protobuf</pattern>
        <shadedPattern>com.shaded_storage.google.protobuf</shadedPattern>
    </relocation>
    <relocation>
        <pattern>com.fasterxml</pattern>
        <shadedPattern>com.fasterxml.shaded</shadedPattern>
    </relocation>
    <relocation>
        <pattern>org.codehaus.jackson</pattern>
        <shadedPattern>org.codehaus.jackson.shaded</shadedPattern>
    </relocation>
</relocations>

在我的初始项目中似乎有效。现在将对主项目进行测试,并希望这有效。

不得不调整几个deps,现在它似乎有效。当项目再次在Google App Engine上运行时,希望情况仍然如此。

卡住管道的初始问题已启动但尚未运行:

今天我将我的Dataflow-Project从Googles Version 1.9.0更新为Apache Beam 2.1.0。管道正在Google Cloud上运行。

当我事先搞砸了一些数据时,我想运行一个非常小的样本管道来清理它,这将从Google数据存储区加载343行,将实体转换为Pojo并将它们写回Google Datastore(通过密钥和名为ID的属性将被更新)。

我在20分钟前启动了Pipeline,它仍然没有从数据存储区加载单个实体。好像它被卡住了。我将杀死约伯并第二次启动,但我想知道问题是什么。

The JobId:2017-09-19_10_08_32-9486491048477927168

日志:enter image description here

管道:

enter image description here

来自控制台的摘要等:

enter image description here

代码:

public static void main(String...args) {
    Pipeline pipe = PipelineCreation.getN1Standard1("updatekeys");
    pipe.apply("ReadArticles", DatastoreIO.v1().read().withNamespace("XXXXX").withProjectId(C.PROJECT_ID).withQuery(DatastoreUtil.getPropertyEqualsQuery("article", "catalogKey", "XXXX")))
        .apply("ToPojo", ParDo.of(new ArticleFromEntity()))
        .apply("ToEntity", ParDo.of(new ArticleToEntity("XXXXX", "article")))
        .apply(DatastoreIO.v1().write().withProjectId(C.PROJECT_ID));
    pipe.run();
}

public static Pipeline getN1Standard1(String jobName){
    DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
    options.setProject("bmt-katalogintegration");
    options.setRunner(DataflowRunner.class);
    options.setStagingLocation("gs://katalog-df-staging/binaries"); 
    options.setTempLocation("gs://katalog-df-staging/binaries/tmp");
    options.setGcpTempLocation("gs://katalog-df-staging/binaries/tmp");
    options.setZone("europe-west3-a");
    options.setJobName(jobName);
    options.setMaxNumWorkers(6);
    options.setWorkerMachineType("n1-standard-1");
    return Pipeline.create(options);
}

public static Query getPropertyEqualsQuery(String kind, String property, String value){
    Query query = Query.newBuilder()
            .addKind(KindExpression.newBuilder().setName(kind).build())
            .setFilter(Filter.newBuilder()
                    .setPropertyFilter(PropertyFilter.newBuilder()
                            .setProperty(PropertyReference.newBuilder()
                                    .setName(property))
                            .setOp(PropertyFilter.Operator.EQUAL)
                            .setValue(Value.newBuilder().setStringValue(value).build())
                            .build())
                    .build())
            .build();
    return query;
}

反馈将受到高度赞赏。

编辑:第二份工作也不起作用。 JobId:2017-09-19_10_46_05-8997694731596581186

编辑:Stackdriver的错误:

    E   at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness.main(DataflowBatchWorkerHarness.java:56) 
E   at com.google.cloud.dataflow.worker.DataflowWorkerHarnessHelper.initializeGlobalStateAndPipelineOptions(DataflowWorkerHarnessHelper.java:39) 
E   at com.google.cloud.dataflow.worker.WorkerPipelineOptionsFactory.createFromSystemProperties(WorkerPipelineOptionsFactory.java:48) 
E   at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2858) 
E   at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3814) 
E   at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserialize(AbstractDeserializer.java:216) 
E   at com.fasterxml.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1012) 
E  com.fasterxml.jackson.databind.JsonMappingException: Can not construct instance of org.apache.beam.sdk.options.PipelineOptions: abstract types either need to be mapped to concrete types, have custom deserializer, or contain additional type information 
E   at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:1456) 
E   at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:270) 
E   at [Source: {"display_data":[{"key":"jobName","namespace":"org.apache.beam.sdk.options.PipelineOptions","type":"STRING","value":"updatekeys4"},{"key":"tempLocation","namespace":"org.apache.beam.sdk.options.PipelineOptions","type":"STRING","value":"gs://katalog-df-staging/binaries/tmp"},{"key":"zone","namespace":"org.apache.beam.sdk.extensions.gcp.options.GcpOptions","type":"STRING","value":"europe-west3-a"},{"key":"appName","namespace":"org.apache.beam.sdk.options.ApplicationNameOptions","type":"STRING","value":"PipelineCreation"},{"key":"stagingLocation","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineOptions","type":"STRING","value":"gs://katalog-df-staging/binaries"},{"key":"zone","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions","type":"STRING","value":"europe-west3-a"},{"key":"maxNumWorkers","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions","type":"INTEGER","value":6},{"key":"project","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineOptions","type":"STRING","value":"bmt-katalogintegration"},{"key":"workerMachineType","namespace":"org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions","type":"STRING","value":"n1-standard-1"},{"key":"runner","namespace":"org.apache.beam.sdk.options.PipelineOptions","shortValue":"DataflowRunner","type":"JAVA_CLASS","value":"org.apache.beam.runners.dataflow.DataflowRunner"},{"key":"gcpTempLocation","namespace":"org.apache.beam.sdk.extensions.gcp.options.GcpOptions","type":"STRING","value":"gs://katalog-df-staging/binaries/tmp"}],"options":{"apiRootUrl":"https://dataflow.googleapis.com/","appName":"PipelineCreation","autoscalingAlgorithm":"NONE","credentialFactoryClass":"org.apache.beam.sdk.extensions.gcp.auth.GcpCredentialFactory","dataflowEndpoint":"","dataflowJobId":"2017-09-20_04_31_04-12049147039561540392","enableCloudDebugger":false,"gcpTempLocation":"gs://katalog-df-staging/binaries/tmp","jobName":"updatekeys4","maxNumWorkers":6,"numWorkers":3,"numberOfWorkerHarnessThreads":0,"pathValidatorClass":"org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator","project":"bmt-katalogintegration","runner":"org.apache.beam.runners.dataflow.DataflowRunner","stableUniqueNames":"WARNING","stagerClass":"org.apache.beam.runners.dataflow.util.GcsStager","stagingLocation":"gs://katalog-df-staging/binaries","streaming":false,"tempLocation":"gs://katalog-df-staging/binaries/tmp","workerMachineType":"n1-standard-1","zone":"europe-west3-a"}}; line: 1, column: 1] 
E  Uncaught exception in main thread. Exiting with status code 1. 
W  Please use a logger instead of System.out or System.err.
Please switch to using org.slf4j.Logger.
See: https://cloud.google.com/dataflow/pipelines/logging 
E  Uncaught exception in main thread. Exiting with status code 1. 

0 个答案:

没有答案