我如何使用jsonloader定义数组的模式?

时间:2015-04-07 20:45:42

标签: arrays json apache-pig elephantbird

我正在使用elephantbird项目将json文件加载到pig。 但我不知道如何在加载时定义模式。没有找到相同的描述。

数据:

{"id":22522,"name":"Product1","colors":["Red","Blue"],"sizes":["S","M"]}
{"id":22523,"name":"Product2","colors":["White","Blue"],"sizes":["M"]}

代码:

feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;

extracted_products = FOREACH feed GENERATE
    products_json#'id' AS id,
    products_json#'name' AS name,
    products_json#'colors' AS colors,
    products_json#'sizes' AS sizes;

describe extracted_products;

结果:

extracted_products: {id: chararray,name: bytearray,colors: bytearray,sizes: bytearray}

我如何为它们提供正确的架构(int,string,array,array)以及如何将数组元素展平为行?

提前致谢

1 个答案:

答案 0 :(得分:0)

将json数组转换为元组:

feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;

extracted_products = FOREACH feed GENERATE
products_json#'id' AS id:chararray,
products_json#'name' AS name:chararray,
products_json#'colors' AS colors:{t:(i:chararray)},
products_json#'sizes' AS sizes:{t:(i:chararray)};

压扁一个元组

flattened = foreach extracted_products generate id,flatten(colors);