如何将spark中的列组合为Scala中的JSON

时间:2017-09-26 05:46:16

标签: json scala apache-spark

我有一个变量,其构造如下使用Spark SQL提取数据:

{
"resourceType" : "Test1",
"count" : 10,
"entry": [{
        "id": "112",
        "gender": "female",
        "birthDate": 1213999
    }, {
        "id": "urn:uuid:002e27cf-3cae-4393-89c5-1b78050d9428",
        "resourceType": "Encounter"
    }]
}

我希望输出格式如下:

    {
    "resourceType" : "Test1",
    "count" : 10,
    "entry" :[
"resource" :{
            "id": "112",
            "gender": "female",
            "birthDate": 1213999
        }, 
"resource" :{
            "id": "urn:uuid:002e27cf-3cae-4393-89c5-1b78050d9428",
            "resourceType": "Encounter"
        }]
    }

我基本上是Scala的新手:),需要帮助。

编辑:添加scala代码以创建JSON:

val bundle = endresult.groupBy("id").agg(count("*") as "total",collect_list("resource") as "entry").
    withColumn("resourceType", lit("Bundle")).
    drop("id").
    select(to_json(struct("resourceType","entry"))).
    map(row => row.getString(0).
        replace("\"entry\":[\"{", "\"entry\":[{").
        replace("}\"]}","}]}"). // Should be at the end of the string ONLY (we might switch to regex instead
        replace("}\",\"{","},{")
        replace("\\\"", "\"")   
    )

0 个答案:

没有答案