使用Pig展平复杂的嵌套JSON并输出|分隔文件

时间:2014-05-29 06:54:55

标签: json apache-pig elephantbird

我需要使用一些复杂的嵌套JSON并将其转换为制表符分隔输出,其中我从输入JSON为每个ts和y对提供唯一的输出行。我知道如何使用制表符分隔格式输出,但无法以正确的方式将JSON展平。基于下面的输入JSON和所需输出的任何建议?我正在使用ElephantBird加载JSON。

我有以下输入JSON:

{
    “gateway": [
        {"beer" : [
                {"change_date": "change_date"},
                {"type": "squirrel-pale-ale"},
                {"vendor": "foo-vendor"},
                {"size": "size"}
            ] 
        },
        {"name": "SBS01"},
        {"hw_version": "1.1"}
    ],
    "sensors": [
        [
            {"info": {
                "name": "fake-sensor01",
                "serial_number": “fakies40911",
                "type": "temperature"
                }
            },
            {"values": [
                    {"ts": 1400869261, "y": 998}, // "ts" is UNIX Epoch in UTC
                    {"ts": 1400869276, "y": 1002}
                ]
            }
        ],
        [
            {"info": {
                "name": "fake-sensor02",
                "serial_number": “fakies40944",
                "type": "flow"
                }
            },
            {"values": [
                    {"ts": 1400869294, "y": 54},
                    {"ts": 1400869303, "y": 76}
                ]
            }
        ]
    ]
}

我可以使用这个猪脚本加载它:

register 's3://path-to-scripts/elephant-bird-core-4.5.jar';
register 's3://path-to-scripts/elephant-bird-hadoop-compat-4.5.jar';
register 's3://path-to-scripts/elephant-bird-pig-4.5.jar';
register 's3://path-to-scripts/json-simple-1.1.1.jar';

data = load 's3://path-to-data/example_record.json' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);

现在我希望得到每个ts和y对平坦的元组,同时保留其他属性的属性。我尝试了使用flatten生成的各种语句序列,并从地图中引用kv对但是很挣扎。寻找有关如何获得此结果的建议:

(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor01, fakies40911, temperature, 1400869261, 998)
(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor01, fakies40911, temperature, 1400869276, 1002)
(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor02, fakies40944, flow, 1400869294, 54)
(SBS01, 1.1, change_date, squirrel-pale-ale, foo-vendor, size, fake-sensor02, fakies40944, flow, 1400869303, 76)

0 个答案:

没有答案