将数据帧保存为spark sql中的XML

时间:2016-12-03 04:17:38

标签: apache-spark apache-spark-sql spark-dataframe

curl

以上行的架构如下

val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues  from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, _dataType, _period, struct(_VALUE, _time) as Value from df_h a left outer join df_ds b on a.batchId = b.batchId left outer join df_dsv c on b.batchId = c.batchId left outer join df_nv d on c.batchId = d.batchId))"
final_df.repartition(1).write.format("xml").option("rowTag","NewTag").save(output_path)

当我尝试使用上面的命令获取XML文件时将数据帧保存为XML。

root
 |-- _xmlns: string (nullable = true)
 |-- md:Date: string (nullable = true)
 |-- md:Creator: string (nullable = true)
 |-- Station: struct (nullable = false)
 |    |-- _ngr: string (nullable = true)
 |    |-- _region: string (nullable = true)
 |    |-- SetofValues: struct (nullable = false)
 |    |    |-- _dataType: string (nullable = true)
 |    |    |-- _period: string (nullable = true)
 |    |    |-- Value: struct (nullable = false)
 |    |    |    |-- _VALUE: double (nullable = true)
 |    |    |    |-- _time: string (nullable = true)

如何实现以下输出。通过创建行数组..

<ROWS>
<NewTag xmlns="testing">
    <md:Date>2016-10-30</md:Date>
    <md:Creator>USER_1</md:Creator>
    <Station ngr="123456" region="North East">
        <SetofValues dataType="Total" period="15 min">
            <Value 3.509" time="05:30:00"></Value>
        </SetofValues>
    </Station>
</NewTag>
<NewTag xmlns="testing">
    <md:Date>2016-10-30</md:Date>
    <md:Creator>USER_1</md:Creator>
    <Station ngr="123456" region="North East">
        <SetofValues dataType="Total" period="15 min">
            <Value 2.6" time="05:45:00"></Value>
        </SetofValues>
    </Station>
</NewTag>
<NewTag xmlns="testing">
    <md:Date>2016-10-30</md:Date>
    <md:Creator>USER_1</md:Creator>
    <Station ngr="123456" region="North East">
        <SetofValues dataType="Total" period="15 min">
            <Value 1.111" time="06:00:00"></Value>
        </SetofValues>
    </Station>
</NewTag>
</ROWS>

我无法将不同的行转换为数组列表以在xml中实现数组

1 个答案:

答案 0 :(得分:0)

晚会,但万一有人想知道你的架构为每个Root的每个站点的每组值包含一个值,就像......

Root  Station  Set  Value
Root  Station  Set  Value
Root  Station  Set  Value
Root  Station  Set  Value

如果您想拥有该输出,则需要按键减少并使“值”成为数组。

因此,在通过三个键减少后,您的数据帧看起来就像......

Root  Station  Set  [Value, Value, Value, ...]