我有一个JSON文件
{
"titlename": "periodic",
"atom": [
{
"usage": "neutron",
"dailydata": [
{
"utcacquisitiontime": "2017-03-27T22:00:00Z",
"datatimezone": "+02:00",
"intervalvalue": 28128,
"intervaltime": 15
},
{
"utcacquisitiontime": "2017-03-27T22:15:00Z",
"datatimezone": "+02:00",
"intervalvalue": 25687,
"intervaltime": 15
}
]
}
]
}
我想追加
"matter":[
{
"usage":"neutron",
"intervalvalue":345678
},
...
]
在intervalvalue
中,我需要在intervalvalues
中为每次使用设置dailydata
的汇总值。我正在使用scala,我能够读取json文件。请帮我汇总和追加!
答案 0 :(得分:1)
您应该使用数据框来获取所需的json
为此你必须将json文件转换为dataframe,这可以作为
完成val json = sc.wholeTextFiles("path to the json file")
.map(tuple => tuple._2.replace("\n", "").trim)
val df = sqlContext.read.json(json)
这将为您输出
+--------------------------------------------------------------------------------------------------------+---------+
|atom |titlename|
+--------------------------------------------------------------------------------------------------------+---------+
|[[WrappedArray([+02:00,15,28128,2017-03-27T22:00:00Z], [+02:00,15,25687,2017-03-27T22:15:00Z]),neutron]]|periodic |
+--------------------------------------------------------------------------------------------------------+---------+
你应该从数据框中提取使用和间隔,这可以作为
完成import org.apache.spark.sql.functions._
val tobemergedDF = df.withColumn("atom", explode(col("atom")))
.withColumn("usage", col("atom.usage"))
.withColumn("atom", explode(col("atom.dailydata")))
.withColumn("intervalvalue", col("atom.intervalvalue"))
.groupBy("usage").agg(sum("intervalvalue").as("intervalvalue"))
tobemergedDF为
+-------+-------------+
|usage |intervalvalue|
+-------+-------------+
|neutron|53815 |
+-------+-------------+
现在你可以把数据帧写成json并合并两个文件。
希望答案有用