使用apache PIG拆分JSON文件

时间:2014-08-05 12:53:12

标签: apache-pig

我有一个JSON输入文件,需要根据关键字拆分成多个文件,输出也应该保留相同的JSON格式。

示例:

此处的关键字是对象EVT.NAME的值。取决于它应该将它路由到输出。

输入有三个不同的值(KEYPRESS,TUNE,TRICK),因此应创建3个不同的输出文件。

输入:

    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672866844,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TUNE","ETS":1402672867117,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402672868600,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672868888,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402673179313,"VALUE":{"KEY":"FAST_FORWARD"}},"HOST":"XXX"}

输出1:

    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672866844,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672868888,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}

输出2:

    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TUNE","ETS":1402672867117,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}

输出3:

    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402672868600,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
    {"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402673179313,"VALUE":{"KEY":"FAST_FORWARD"}},"HOST":"XXX"}

1 个答案:

答案 0 :(得分:0)

您可以使用JsonLoader和JsonStorage。请参阅此文章 - http://joshualande.com/read-write-json-apache-pig

table = LOAD' file.json'     使用JsonLoader(' KEYPRESS:chararray,TUNE:chararray,TRICK:chararray');