我有一个JSON输入文件,需要根据关键字拆分成多个文件,输出也应该保留相同的JSON格式。
示例:
此处的关键字是对象EVT.NAME的值。取决于它应该将它路由到输出。
输入有三个不同的值(KEYPRESS,TUNE,TRICK),因此应创建3个不同的输出文件。
输入:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672866844,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TUNE","ETS":1402672867117,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402672868600,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672868888,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402673179313,"VALUE":{"KEY":"FAST_FORWARD"}},"HOST":"XXX"}
输出1:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672866844,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672868888,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
输出2:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TUNE","ETS":1402672867117,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
输出3:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402672868600,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402673179313,"VALUE":{"KEY":"FAST_FORWARD"}},"HOST":"XXX"}
答案 0 :(得分:0)
您可以使用JsonLoader和JsonStorage。请参阅此文章 - http://joshualande.com/read-write-json-apache-pig。
table = LOAD' file.json' 使用JsonLoader(' KEYPRESS:chararray,TUNE:chararray,TRICK:chararray');