使用Nifi构造摄取json数据的可能性

时间:2017-01-13 00:10:59

标签: json hadoop apache-nifi hortonworks-sandbox

是否可以使用Nifi将json文件加载到结构化表中?

我已经调用了以下天气预报数据(来自6000个气象站),我目前正在加载到HDFS中。这一切都出现在一条线上:

{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2017-01-12T22:00:00Z","type":"Forecast","Location":[{"i":"14","lat":"54.9375","lon":"-2.8092","name":"CARLISLE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"50.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"WNW","F":"-3","G":"25","H":"67","Pp":"0","S":"13","T":"2","V":"EX","W":"1","U":"1","$":"720"}}},{"i":"22","lat":"53.5797","lon":"-0.3472","name":"HUMBERSIDE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"24.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"NW","F":"-2","G":"43","H":"63","Pp":"3","S":"25","T":"4","V":"EX","W":"3","U":"1","$":"720"}}}, .....

理想情况下,我希望架构结构化为6000行表。

我已经尝试过编写一个模式将上面的内容传递给Pig,但是还没有成功,可能是因为我对json不够熟悉而无法正确翻译。

为了向数据添加一些结构的简单方法,我发现在Nifi中有一个PutHBaseJson处理器。

任何人都可以建议这个PutHBaseJson处理器是否可以使用上述数据结构吗?如果是这样,有人能指出我一个体面的教程,给我一个关于配置的起点吗?

非常感谢任何指导。

1 个答案:

答案 0 :(得分:3)

您可能希望使用SplitJson处理器将6000记录JSON结构拆分为6000个单独的流文件。如果您需要从顶级响应中“注入”参数定义,则可以执行ReplaceTextJoltTransformJSON操作来操作各个JSON记录。这是Yolanda Davis的good article,描述了如何在NiFi中执行Jolt变换(JSON - > JSON)。

一旦您拥有包含单个JSON记录的单个流文件,将它们放入HBase非常容易。 Bryan Bende为PutHBaseJson处理器写了article describing the necessary configurations