无法弄清楚如何使用NiFi

时间:2019-08-06 16:02:35

标签: sql arrays json apache-nifi jsonpath

我正在从事一个个人项目,并且对JSON,NiFi,SQL等非常新(正在学习),因此请原谅此处使用的任何令人困惑的语言或可能非常明显的解决方案。我可以根据需要澄清。

我需要从网站的API调用中获取JSON输出,并将其插入到我设置的MariaDB本地服务器中的表中。问题是JSON数据是嵌套的,需要插入的两个关键数据用作变量关键对象而不是值,所以我不知道如何提取它并将其放入数据库表中。本质上,我认为我需要识别JSON表达式的不同部分并将它们作为值插入,但是我不知道该怎么做。

我特别使用过EvaluateJSON,SplitJSON和FlattenJSON处理器,但是我无法使其工作。我所能做的就是获取整个表达式的结果,而不是每个表达式的结果。

{"5381":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":74.0,"tm_def_snp":63.0,"temperature":58.0,"st_snp":8.0,"punts":4.0,"punt_yds":178.0,"punt_lng":55.0,"punt_in_20":1.0,"punt_avg":44.5,"humidity":47.0,"gp":1.0,"gms_active":1.0},

"1023":{"wind_speed":4.0,"tm_st_snp":26.0,"tm_off_snp":82.0,"tm_def_snp":56.0,"temperature":74.0,"off_snp":82.0,"humidity":66.0,"gs":1.0,"gp":1.0,"gms_active":1.0},

"5300":{"wind_speed":17.0,"tm_st_snp":27.0,"tm_off_snp":80.0,"tm_def_snp":64.0,"temperature":64.0,"st_snp":21.0,"pts_std":9.0,"pts_ppr":9.0,"pts_half_ppr":9.0,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl":4.0,"idp_sack":1.0,"idp_qb_hit":2.0,"humidity":100.0,"gp":1.0,"gms_active":1.0,"def_snp":23.0},

"608":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":53.0,"tm_def_snp":79.0,"temperature":88.0,"st_snp":4.0,"pts_std":5.5,"pts_ppr":5.5,"pts_half_ppr":5.5,"idp_tkl_solo":4.0,"idp_tkl_loss":1.0,"idp_tkl_ast":1.0,"idp_tkl":5.0,"humidity":78.0,"gs":1.0,"gp":1.0,"gms_active":1.0,"def_snp":56.0},

"3396":{"wind_speed":6.0,"tm_st_snp":20.0,"tm_off_snp":60.0,"tm_def_snp":70.0,"temperature":63.0,"st_snp":19.0,"off_snp":13.0,"humidity":100.0,"gp":1.0,"gms_active":1.0}}

这是具有几千行的输出的快照。您在上方看到的每个数字键(5381、1023、5300等)都是以下统计信息的玩家ID。我有一个由三列组成的表:Player IDStat IDStat Value。例如,我需要将第一个代码段这样插入到我的表中:

Player ID        Stat ID        Stat Value
5381             wind_speed     4.0
5381             tm_st_snp      26.0
5381             tm_off_snp     74.0

依次类推,针对每条数据。但是我不知道如何让NiFi选择要插入正确列的正确数据。

1 个答案:

答案 0 :(得分:0)

我相信可以使用jolt将json转换为一种格式:

[
  {"playerId":"5381", "statId":"wind_speed", "statValue": 0.123},
  {"playerId":"5381", "statId":"tm_st_snp", "statValue": 0.456},
  ...
] 

然后在JSON阅读器中使用PutDatabaseRecord。


另一种方法是使用ExecuteGroovyScript处理器。

使用名称SQL.mydb向其添加新参数并将其链接到您的DBCP控制器服务

enter image description here

并使用以下脚本作为Script Body参数:

import groovy.json.JsonSlurper
import groovy.json.JsonBuilder

def ff=session.get()
if(!ff)return


//read flow file content and parse it
def body = ff.read().withReader("UTF-8"){reader-> 
    new JsonSlurper().parse(reader) 
}

def results = []
//use defined sql connection to create a batch
SQL.mydb.withTransaction{
    def cmd = 'insert into mytable(playerId, statId, statValue) values(?,?,?)'
    results = SQL.mydb.withBatch(100, cmd){statement->
        //run through all keys/subkeys in flow file body
        body.each{pid,keys->
            keys.each{k,v->
                statement.addBatch(pid,k,v)
            }
        }
    }
}

//write results as a new flow file content
ff.write("UTF-8"){writer-> 
    new JsonBuilder(results).writeTo(writer) 
}
//transfer to success
REL_SUCCESS << ff