Hadoop Scoobi : instantiate a DList from XML

时间:2015-06-15 14:56:18

标签: xml scala hadoop scoobi

I am fairly new to Scala, Hadoop & Scoobi.

We have some hadoop jobs where we process CSV files and do the Scoobi routines with

  // Parse the input file
  val lines = fromTextFile(input)

  // Iterate on every element to generate the keys, and then aggregate it
  val counts = lines.mapFlatten( ... 

1. I have the impression that I can't do it for XML files. Is that so? or can i process XML with Scoobi?

2. I think I can parse and flatten the XML nodes to a lines with scala native xml. But then how do I create a Scoobi DList.

(why? because I will need to join it with another one coming from a CSV file)

Note : My xml consists of nodes like the following :

 <add>
    <AdCampaign class="BCSAdCampaign">
        <Subscriber>TVC</Subscriber>
        <CampaignName>3402376</CampaignName>
        <CampaignId>1NTGXNAY</CampaignId>
        <AccountManager/>
        <FromDate>20130212</FromDate>
        <ToDate>20140207</ToDate>
        <ReportingInd>N</ReportingInd>
        <CampaignAdmin>NAWASTHI MCG-TVC</CampaignAdmin>
        <SalesChannel>TC8</SalesChannel>
        <Email/>
        <Advertiser>MU0</Advertiser>
        <Date>20150609</Date>
    </AdCampaign>
</add>

0 个答案:

没有答案