现在,我使用结构化流技术来对接Kafka数据。 Kafka中的数据为JSON格式。我得到的卡夫卡数据如下所示:
JSON数据
{"actly_payed":"300.0","total_amount":"2893.0","org_id":"8888","product_id":"4819569","payed_date":"2019-10-31 20:34:04","id":"200946364","order_id":"100233856","product_name":"test product_name"}
火花代码
Dataset<String> stringDataset = words.flatMap( new FlatMapFunction<String, String>() {
@Override
public Iterator<String> call(String s) throws Exception {
List<String> list = new ArrayList<>( );
JSONObject jsonObject = handleJson( s );
//Sample JSON data: {"actly_payed":"300.0","total_amount":"2893.0","org_id":"8888","product_id":"4819569","payed_date":"2019-10-31 20:34:04","id":"200946364","order_id":"100233856","product_name":"test product_name"}
for (Map.Entry<String, Object> entry : jsonObject.entrySet()) {
list.add(entry.getKey() + ":" + entry.getValue());
}
return list.iterator();
}
}, Encoders.STRING() );
}
我的手术结果如下:
此DF中只有一个值列,该值是JSON字符串
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"actly_payed":"300.0","total_amount":"2893.0","org_id":"8888","product_id":"4819569","payed_date":"2019-10-31 20:34:04","id":"200946364","order_id":"100233856","product_name":"test product_name"}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我的数据集中的值是一个JSON字符串(以键值的形式),
如何使用spark.sql ("select columns from tableName")
查询数据。希望得到您的帮助
使用spark版本:2.3.0
使用的语言:Java
答案 0 :(得分:0)
您只需要应用以下内容(标量示例):
val stringDataset: Dataset[String] = Seq(
"""{"actly_payed":"300.0","total_amount":"2893.0","org_id":"8888","product_id":"4819569","payed_date":"2019-10-31 20:34:04","id":"200946364","order_id":"100233856","product_name":"test product_name"}"""
).toDS
val df = spark.read.json(stringDataset.rdd)
df.show(false)
+-----------+---------+---------+------+-------------------+----------+-----------------+------------+
|actly_payed|id |order_id |org_id|payed_date |product_id|product_name |total_amount|
+-----------+---------+---------+------+-------------------+----------+-----------------+------------+
|300.0 |200946364|100233856|8888 |2019-10-31 20:34:04|4819569 |test product_name|2893.0 |
+-----------+---------+---------+------+-------------------+----------+-----------------+------------+