curl
以上行的架构如下
val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, _dataType, _period, struct(_VALUE, _time) as Value from df_h a left outer join df_ds b on a.batchId = b.batchId left outer join df_dsv c on b.batchId = c.batchId left outer join df_nv d on c.batchId = d.batchId))"
final_df.repartition(1).write.format("xml").option("rowTag","NewTag").save(output_path)
当我尝试使用上面的命令获取XML文件时将数据帧保存为XML。
root
|-- _xmlns: string (nullable = true)
|-- md:Date: string (nullable = true)
|-- md:Creator: string (nullable = true)
|-- Station: struct (nullable = false)
| |-- _ngr: string (nullable = true)
| |-- _region: string (nullable = true)
| |-- SetofValues: struct (nullable = false)
| | |-- _dataType: string (nullable = true)
| | |-- _period: string (nullable = true)
| | |-- Value: struct (nullable = false)
| | | |-- _VALUE: double (nullable = true)
| | | |-- _time: string (nullable = true)
如何实现以下输出。通过创建行数组..
<ROWS>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 3.509" time="05:30:00"></Value>
</SetofValues>
</Station>
</NewTag>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 2.6" time="05:45:00"></Value>
</SetofValues>
</Station>
</NewTag>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 1.111" time="06:00:00"></Value>
</SetofValues>
</Station>
</NewTag>
</ROWS>
我无法将不同的行转换为数组列表以在xml中实现数组
答案 0 :(得分:0)
晚会,但万一有人想知道你的架构为每个Root的每个站点的每组值包含一个值,就像......
Root Station Set Value
Root Station Set Value
Root Station Set Value
Root Station Set Value
如果您想拥有该输出,则需要按键减少并使“值”成为数组。
因此,在通过三个键减少后,您的数据帧看起来就像......
Root Station Set [Value, Value, Value, ...]