将RDD [Elem]保存到XML文件

时间:2017-08-11 15:39:57

标签: xml scala rdd

我有一个Elem类型的RDD:

val clientXml: RDD[Elem] = parsedClient.filter(s => s.isSuccess).map(s => convertToXML.clientToXML(s.get))

此RDD包含Elem类型的元素集合,每个元素如下所示:

<client>
  <first>Alexandra</first>
  <last>Diaz</last>
  <title></title>
  <addresses>
    <address>
      <type>Home</type>
      <addr1>3255 Marsh Elder</addr1>
      <addr2></addr2>
      <city>La Jolla</city>
      <province>CA </province>
      <county>United States</county>
    </address>
  </addresses>
</client>

我想将整个RDD保存到XML文件中的格式如下:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>.
    <client>
      <first>Alexandra</first>
      <last>Diaz</last>
      <title></title>
      <addresses>
        <address>
          <type>Home</type>
          <addr1>3255 Marsh Elder</addr1>
          <addr2></addr2>
          <city>La Jolla</city>
          <province>CA </province>
          <county>United States</county>
        </address>
      </addresses>
    </client>

到目前为止,我已设法使用以下方法保存一个元素。但我需要将所有元素保存在一个文件中:

val clientElem: Elem = clientXml.treeReduce((a,b) => a) 

XML.save("C:/Temp/Client.xml", clientElem.copy(), "UTF-8", true)

请注意.saveAsTextFile()不是我想要的。

1 个答案:

答案 0 :(得分:0)

通过将RDD[Elem]转换为List[Elem]

来解决此问题
val clientXmlList: List[Elem] = for (address <- clientXml.collect().toSeq.toList) yield {
      address
    }

然后创建了一个数据节点,其List[Elem]中嵌入了Elem中的元素:

val clientXmlElemData: Elem = <data>
  {clientXmlList.map(p => p.copy())}
</data>

然后使用XML.write()方法写入XML文件:

// create a null DocType so that the docType is not inserted to the output XML file
val doctype = null

// create a FileWriter which writes to a file "C:/Temp/Client.xml"
val file = new File("C:/Temp/Client.xml")

// create a BufferedWriter to write to the file "C:/Temp/Client.xml"
val bw = new BufferedWriter(new FileWriter(file))

// write the clientXmlElemData node to the file setting write xml declaration to true
XML.write(bw, clientXmlElemData, "UTF-8", true, doctype)

// close the BufferedWriter after the file has been created
bw.close()