从Dataflow api在数据存储区中保存长度超过1500字节的字符串时出错

时间:2016-12-14 18:08:10

标签: google-app-engine google-cloud-datastore google-cloud-dataflow

当我尝试保存一个很长的字符串时,Dataflow作业会抛出此错误消息:property" myProperty"超过1500字节。,code = INVALID_ARGUMENT。

在关注Google的DatastoreWordCount示例并保存字符串longuer然后保存1500字节时出错。

我知道在使用Datastore API时,我可以通过将属性保存为com.google.appengine.api.datastore.Text来保存超过1500字节的字符串。但是,DatastoreWordCount示例或DatastoreHelper类文档中没有替代方法可以表明支持Text类型。

可以使用该API保存这么长的字符串,以便它可以被读作com.google.appengine.api.datastore.Text吗?

完整的错误消息如下:

java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: com.google.datastore.v1.client.DatastoreException: The value of property "dalekTestExecutions" is longer than 1500 bytes., code=INVALID_ARGUMENT
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:284)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext$1.outputWindowedValue(DoFnRunnerBase.java:508)
    at com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsAndCombineDoFn.closeWindow(GroupAlsoByWindowsAndCombineDoFn.java:205)
    at com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsAndCombineDoFn.processElement(GroupAlsoByWindowsAndCombineDoFn.java:192)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190)
    at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
    at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:224)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:185)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:72)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

3 个答案:

答案 0 :(得分:4)

您可以通过从索引中排除值来保存长度超过1500字节的字符串:

Value longString = Value.newBuilder()
    .setStringValue(...)
    .setExcludeFromIndexes(true)
    .build();

如果您需要与App Engine的com.google.appengine.api.datastore.Text类型兼容,您还需要将含义设置为15:

Value longString = Value.newBuilder()
    .setStringValue(...)
    .setExcludeFromIndexes(true)
    .setMeaning(15)
    .build();

答案 1 :(得分:2)

DataStore为每个属性创建索引,因此属性的默认限制为1500个字节。现在,如果您需要存储类似大JSON的数据,则可以通过以下方式指定该属性不需要索引:

Entity newEntity =
                Entity.newBuilder(key)
                        .set("time", Timestamp.parseTimestamp("1970-01-01T00:00:00Z"))
                        .set("message", StringValue.newBuilder(JSON).setExcludeFromIndexes(true).build())
                        .build();

这样,您将能够保存更大尺寸的数据,而不是默认的1500字节限制。

答案 2 :(得分:0)

确切地说:

StringValue.newBuilder(yourString).setExcludeFromIndexes(true).build()