如何使用Spark Scala读取单个CSV文件中存在的多个XML文件的多行标记

时间:2019-11-27 10:05:54

标签: scala apache-spark databricks azure-databricks

我有下面的示例文件,其中csv文件中包含多个xml数据。我想在同一数据框中读取多个原始标签和多个xml文件。我尝试了以下代码

val DFcorporateData = spark.sqlContext.read.format("csv")
//.option("rowTag", "messageData")
.option("rowTag", "corporateData")
.xml("/FileStore/tables/new.csv")

我什至尝试过对相同的xml文件进行架构,但无法正常工作。有人知道吗?

"CORPORATE","<?xml version=""1.0"" encoding=""UTF-8""?><dataFeed><messageData1><messageId>1304</messageId><creationDate>2018-10-10T10:48:31.874-06:00</creationDate><hash>a6361daf861b7ff9b35a7ece256b0b8002246bd2</hash><messageType>CORPORATE</messageType></messageData><corporateData><accountData><corporateId>2809816362</corporateId><ffpNumber>5004525913</ffpNumber><companyName>Accenture</companyName><taxId>987368159</taxId><status>A</status><enrollmentDate>2018-10-10T10:48:30.000-06:00</enrollmentDate><enrollmentChannel>V</enrollmentChannel></accountData><administratorData><firstName>Shiwali</firstName><lastName>Longfield</lastName><email>Alethia.Longfield655@corp.com</email><phoneNumber>5503725935</phoneNumber></administratorData><addressData><addressLine1>9717 Cottage Drive</addressLine1><addressLine2>Nvidia's Playground</addressLine2><addressLine3>Star Bazaar</addressLine3><zipCode>221915</zipCode><city>Hederabad</city><country>IND</country></addressData></corporateData></dataFeed>","18/10/10","ExtAccountData 28098","S",   
"CORPORATE","<?xml version=""1.0"" encoding=""UTF-8""?><dataFeed><messageData1><messageId>1306</messageId><creationDate>2018-10-10T10:48:32.986-06:00</creationDate><hash>4d811d7da5095a12cb5858ff35eb8a663a6f95d1</hash><messageType>CORPORATE</messageType></messageData><corporateData><accountData><corporateId>2809816363</corporateId><ffpNumber>5004525924</ffpNumber><companyName>Motorola</companyName><taxId>902616355</taxId><status>A</status><enrollmentDate>2018-10-10T10:48:32.000-06:00</enrollmentDate><enrollmentChannel>V</enrollmentChannel></accountData><administratorData><firstName>Rahul</firstName><lastName>Bethune</lastName><email>Rod.Bethune213@corp.com</email><phoneNumber>6296534973</phoneNumber></administratorData><addressData><addressLine1>7054 Brickyard St.</addressLine1><zipCode>07501</zipCode><city>Paterson</city><state>NJ</state><country>USA</country></addressData></corporateData></dataFeed>","18/10/10","ExtAccountData 28098","S", 

0 个答案:

没有答案