任何人都可以帮助使用Java代码将以下JSON转换为Spark Dataframe ..
注意:它不是文件
逻辑: 收听kafka主题T1,读取RDD中的每条记录并应用其他逻辑将结果数据转换为Json Object并将其写入Kafka中的另一个主题T2 ..
T2结构如下。
JSON:
[
{
"@tenant_id":"XYZ",
"alarmUpdateTime":1526342400000,
"alarm_id":"AB5C9123",
"alarm_updates":[
{
"alarmField":"Severity",
"new_value":"Minor",
"old_value":"Major"
},
{
"alarmField":"state",
"new_value":"UPDATE",
"old_value":"NEW"
}
],
"aucID":"5af83",
"inID":"INC15234567",
"index":"test",
"product":"test",
"source":"ABS",
"state":"NEW"
}
]
创建的类:
ClassAlarm{
String @tenant_id;
String alarm_id;
.
.
List <AlarmUpdate> update;
Get and Setter functions for all variables
}
AlarmUpdate{
String alarmField;
String oldVal;
String NewVal;
Get and Setter functions for all variables
}
AppClass{
void static main(){
Alarm alarmObj = new Alarm();
//set values for variables in alarmObj.
Dataset <Row> results = jobCtx.getSparkSession().createDataFrame(Arrays.asList(alarmObj), Alarm.class)
//At this point seeing following errors.
}
}
错误:
2018-05-15 13:40:48错误JobScheduler - 运行作业流时出错 job 1526406040000 ms.0 scala.MatchError: com.ca.alarmupdates.AlarmUpdate@48c8809b(班级 com.ca.alarmupdates.AlarmUpdate) at org.apache.spark.sql.catalyst.CatalystTypeConverters $ StructConverter.toCatalystImpl(CatalystTypeConverters.scala:236) 〜[火花catalyst_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.catalyst.CatalystTypeConverters $ StructConverter.toCatalystImpl(CatalystTypeConverters.scala:231) 〜[火花catalyst_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.catalyst.CatalystTypeConverters $ CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103) 〜[火花catalyst_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.catalyst.CatalystTypeConverters $ ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:170) 〜[火花catalyst_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.catalyst.CatalystTypeConverters $ ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:154) 〜[火花catalyst_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.catalyst.CatalystTypeConverters $ CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103) 〜[火花catalyst_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.sql.catalyst.CatalystTypeConverters $$ anonfun $ createToCatalystConverter $ 2.apply(CatalystTypeConverters.scala:379) 〜[火花catalyst_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.SQLContext $$ anonfun $ beansToRows $ 1 $$ anonfun $ apply $ 1.apply(SQLContext.scala:1105) 〜[火花sql_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.SQLContext $$ anonfun $ beansToRows $ 1 $$ anonfun $ apply $ 1.apply(SQLContext.scala:1105) 〜[火花sql_2.11-2.2.0.jar:2.2.0] 在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33) 〜[JAF-SDK-2.4.0.jar:?] at scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.TraversableLike $ class.map(TraversableLike.scala:234) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186) 〜[JAF-SDK-2.4.0.jar:?] at org.apache.spark.sql.SQLContext $$ anonfun $ beansToRows $ 1.apply(SQLContext.scala:1105) 〜[火花sql_2.11-2.2.0.jar:2.2.0] at org.apache.spark.sql.SQLContext $$ anonfun $ beansToRows $ 1.apply(SQLContext.scala:1103) 〜[火花sql_2.11-2.2.0.jar:2.2.0] 在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:409) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.Iterator $ class.toStream(Iterator.scala:1322) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.AbstractIterator.toStream(Iterator.scala:1336) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.TraversableOnce $ class.toSeq(TraversableOnce.scala:298) 〜[JAF-SDK-2.4.0.jar:?] 在scala.collection.AbstractIterator.toSeq(Iterator.scala:1336) 〜[JAF-SDK-2.4.0.jar:?] 在org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:406) 〜[火花sql_2.11-2.2.0.jar:2.2.0] at com.ca.alarmupdates.AlarmUpdates.lambda $ null $ 0(AlarmUpdates.java:85) 〜[类/ :?] at java.util.Arrays $ ArrayList.forEach(Arrays.java:3880)~ [?:1.8.0_161] 在com.ca.alarmupdates.AlarmUpdates.lambda $ main $ f87f782d $ 1(AlarmUpdates.java:58) 〜[类/ :?] 在org.apache.spark.streaming.api.java.JavaDStreamLike $$ anonfun $ foreachRDD $ 1.apply(JavaDStreamLike.scala:272) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.api.java.JavaDStreamLike $$ anonfun $ foreachRDD $ 1.apply(JavaDStreamLike.scala:272) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.DStream $$ anonfun $ foreachRDD $ 1 $$ anonfun $ apply $ mcV $ sp $ 3.apply(DStream.scala:628) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.DStream $$ anonfun $ foreachRDD $ 1 $$ anonfun $ apply $ mcV $ sp $ 3.apply(DStream.scala:628) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1 $$ anonfun $ apply $ mcV $ sp $ 1.apply $ mcV $ sp(ForEachDStream.scala:51) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1 $$ anonfun $ apply $ mcV $ sp $ 1.apply(ForEachDStream.scala:51) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1 $$ anonfun $ apply $ mcV $ sp $ 1.apply(ForEachDStream.scala:51) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1.apply $ mcV $ sp(ForEachDStream.scala:50) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1.apply(ForEachDStream.scala:50) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1.apply(ForEachDStream.scala:50) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在scala.util.Try $ .apply(Try.scala:192)〜[jaf-sdk-2.4.0.jar:?] 在org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.scheduler.JobScheduler $ JobHandler $$ anonfun $ run $ 1.apply $ mcV $ sp(JobScheduler.scala:257) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.scheduler.JobScheduler $ JobHandler $$ anonfun $ run $ 1.apply(JobScheduler.scala:257) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在org.apache.spark.streaming.scheduler.JobScheduler $ JobHandler $$ anonfun $ run $ 1.apply(JobScheduler.scala:257) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) 〜[JAF-SDK-2.4.0.jar:?] 在org.apache.spark.streaming.scheduler.JobScheduler $ JobHandler.run(JobScheduler.scala:256) 〜[火花streaming_2.11-2.2.0.jar:2.2.0] 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 〜[:?1.8.0_161] at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) 〜[:?1.8.0_161] 在java.lang.Thread.run(Thread.java:748)〜[?:1.8.0_161]
答案 0 :(得分:0)
您可以使用wholeTextFiles
读取json文件并获取json文本,并在json
的{{1}} api中使用它
SparkSession
你应该
import org.apache.spark.sql.SparkSession;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
static SparkSession spark = SparkSession.builder().master("local").appName("simple").getOrCreate();
static JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
Dataset<Row> df = spark.read().json(sc.wholeTextFiles("path to json file").map(t -> t._2()));
df.show(false);
您可以根据需要使用+----------+---------------+--------+--------------------------------------------+-----+-----------+-----+-------+------+-----+
|@tenant_id|alarmUpdateTime|alarm_id|alarm_updates |aucID|inID |index|product|source|state|
+----------+---------------+--------+--------------------------------------------+-----+-----------+-----+-------+------+-----+
|XYZ |1526342400000 |AB5C9123|[[Severity,Minor,Major], [state,UPDATE,NEW]]|5af83|INC15234567|test |test |ABS |NEW |
+----------+---------------+--------+--------------------------------------------+-----+-----------+-----+-------+------+-----+
和master
<强>更新强>
您评论过
appName
为此,假设你有从T1主题读取的记录为字符串对象
The way you do through file , can we do it with the object . I have to convert to Ingest the data to the other T2
并将其转换为 String t1Record = "[\n" +
" {\n" +
" \"@tenant_id\":\"XYZ\",\n" +
" \"alarmUpdateTime\":1526342400000,\n" +
" \"alarm_id\":\"AB5C9123\",\n" +
" \"alarm_updates\":[\n" +
" {\n" +
" \"alarmField\":\"Severity\",\n" +
" \"new_value\":\"Minor\",\n" +
" \"old_value\":\"Major\"\n" +
" },\n" +
" {\n" +
" \"alarmField\":\"state\",\n" +
" \"new_value\":\"UPDATE\",\n" +
" \"old_value\":\"NEW\"\n" +
" }\n" +
" ],\n" +
" \"aucID\":\"5af83\",\n" +
" \"inID\":\"INC15234567\",\n" +
" \"index\":\"test\",\n" +
" \"product\":\"test\",\n" +
" \"source\":\"ABS\",\n" +
" \"state\":\"NEW\"\n" +
" }\n" +
"]";
RDD
然后,您可以应用 JavaRDD<String> t1RecordRDD = sc.parallelize(Arrays.asList(t1Record));
api转换为json
dataframe
应该给你与上面相同的结果