如何在Pig查询中展平嵌套的Avro记录?

时间:2015-08-07 18:39:34

标签: hadoop apache-pig hdfs avro

Avro架构如下所示:

{
  "type" : "record",
  "name" : "name1",
  "fields" :
  [
    {
      "name" : "f1",
      "type" : "string"
    },
    {
      "name" : "f2",
      "type" :
      {
        "type" : "array",
        "items" :
        {
          "type" : "record",
          "name" : "name2",
          "fields" :
          [
            {
              "name" : "time",
              "type" : [ "float", "int", "double", "long" ]
            },
          ]
        }
      }
    }
  ]
}

在Pig阅读之后:

grunt> A = load 'data' using AvroStorage();
grunt> DESCRIBE A;
A: {f1: chararray,f2: {ARRAY_ELEM: (time: (FLOAT: float,INT: int,DOUBLE: double,LONG: long))}}

我想要的可能是一包(f1:chararray, timestamp:double)。这就是我所做的:

grunt> B = FOREACH A GENERATE f1, f2.time AS timestamp;
grunt> DESCRIBE B;
B: {f1: chararray,timestamp: {(time: (FLOAT: float,INT: int,DOUBLE: double,LONG: long))}}

那么我该如何压扁这个记录?

我是Pig,Avro的新手,并且不知道我想要做什么甚至是有道理的。谢谢你的帮助。

0 个答案:

没有答案