(1)有一个BigQuery源表,如...
column_name | is_nullable | data_type
OrderId | YES | STRING
items | NO | ARRAY<STRUCT<articleId STRING, quantity FLOAT64INT64>>
从关系表的角度来看,“ OrderId”应该是键。
(2)现在,我想将ARRAY / STRUCT记录标准化为单独的表。 为此,我使用了“ Wrangler”变换。
注意:这是Data Fusion Studio的“运输”部分中的“牧马人”!尝试通过汉堡菜单打开“牧马人”并选择BQ源表时,提示: BigQuery类型的STRUCT不支持。
源表的输出链接到Wrangler的输入。
在牧马人中,我定义了...
combiOrderId | string | yes
items | array | no
record [ {articleId | string | yes}, {quantity | float | yes} ]
(3) BQ接收器表将Wrangler输出作为输入架构,我将最终的架构定义为(Name | Type | Null)
combiOrderId | string | yes
articleId | string | yes
quantity | float | yes
现在,在运行管道(预览模式)时,会记录以下错误消息:
问题转换为输出记录。原因:无法解码数组 '项目'
(下面有完整的消息)
任何提示或其他解决方案都将非常受欢迎:-)
谢谢。
牧马人输出模式的JSON:
[
{
"name": "etlSchemaBody",
"schema": {
"type": "record",
"name": "etlSchemaBody",
"fields": [
{
"name": "combiOrderId",
"type": [
"string",
"null"
]
},
{
"name": "items",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "a6adafef5943d4757b2fad43a10732952",
"fields": [
{
"name": "articleId",
"type": [
"string",
"null"
]
},
{
"name": "quantity",
"type": [
"float",
"null"
]
}
]
}
}
}
]
}
}
]
完整(第一个)错误日志:
java.lang.Exception: Stage:Normalize-items - Reached error threshold 1, terminating processing due to error : Problem converting into output record. Reason : Unable to decode array 'items'
at io.cdap.wrangler.Wrangler.transform(Wrangler.java:412) ~[1576661389534-0/:na]
at io.cdap.wrangler.Wrangler.transform(Wrangler.java:94) ~[1576661389534-0/:na]
at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$transform$5(WrappedTransform.java:90) ~[cdap-etl-core-6.1.0.jar:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[cdap-etl-core-6.1.0.jar:na]
at io.cdap.cdap.etl.common.plugin.WrappedTransform.transform(WrappedTransform.java:89) ~[cdap-etl-core-6.1.0.jar:na]
at io.cdap.cdap.etl.common.TrackedTransform.transform(TrackedTransform.java:74) ~[cdap-etl-core-6.1.0.jar:na]
at io.cdap.cdap.etl.spark.function.TransformFunction.call(TransformFunction.java:50) ~[hydrator-spark-core2_2.11-6.1.0.jar:na]
at io.cdap.cdap.etl.spark.Compat$FlatMapAdapter.call(Compat.java:126) ~[hydrator-spark-core2_2.11-6.1.0.jar:na]
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) ~[scala-library-2.11.8.jar:na]
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) ~[scala-library-2.11.8.jar:na]
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[scala-library-2.11.8.jar:na]
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:128) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.scheduler.Task.run(Task.scala:109) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) [spark-core_2.11-2.3.3.jar:2.3.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_232]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]
Caused by: io.cdap.wrangler.api.RecipeException: Problem converting into output record. Reason : Unable to decode array 'items'
at io.cdap.wrangler.executor.RecipePipelineExecutor.execute(RecipePipelineExecutor.java:102) ~[wrangler-core-4.1.3.jar:na]
at io.cdap.wrangler.Wrangler.transform(Wrangler.java:384) ~[1576661389534-0/:na]
... 25 common frames omitted
Caused by: io.cdap.wrangler.utils.RecordConvertorException: Unable to decode array 'items'
at io.cdap.wrangler.utils.RecordConvertor.decodeArray(RecordConvertor.java:382) ~[wrangler-core-4.1.3.jar:na]
at io.cdap.wrangler.utils.RecordConvertor.decode(RecordConvertor.java:142) ~[wrangler-core-4.1.3.jar:na]
at io.cdap.wrangler.utils.RecordConvertor.decodeUnion(RecordConvertor.java:368) ~[wrangler-core-4.1.3.jar:na]
at io.cdap.wrangler.utils.RecordConvertor.decode(RecordConvertor.java:152) ~[wrangler-core-4.1.3.jar:na]
at io.cdap.wrangler.utils.RecordConvertor.decodeRecord(RecordConvertor.java:85) ~[wrangler-core-4.1.3.jar:na]
at io.cdap.wrangler.utils.RecordConvertor.toStructureRecord(RecordConvertor.java:56) ~[wrangler-core-4.1.3.jar:na]
at io.cdap.wrangler.executor.RecipePipelineExecutor.execute(RecipePipelineExecutor.java:99) ~[wrangler-core-4.1.3.jar:na]
... 26 common frames omitted
答案 0 :(得分:0)
添加评论作为答案。
关于调试错误: 检查错误可能最简单的方法是从牧马人导航。您可以按照以下步骤操作
在这一点之后,您应该看到源和牧马人变换已经配置好了。然后,您可以添加接收器并运行预览以测试事情是否存在
要解决您的其他问题: Wrangler仅在BQ源中支持阵列类型。它不支持从BigQuery读取STRUCT类型。我的猜测就是那就是为什么您看到此问题。 issues.cask.co/browse/CDAP-15665