我有XML,我尝试使用 Scala XML API 。我有XPath查询从XML标签中检索数据。我想从<price>
检索<market>
代码值,但使用两个属性_id
和type
。我想用&&
写一个条件,以便我为每个价格标签获取一个唯一值,例如其中MARKET _ID = 1 && TYPE = "A"
。
供参考查找以下XML:
<publisher>
<book _id = "0">
<author _id="0">Dev</author>
<publish_date>24 Feb 1995</publish_date>
<description>Data Structure - C</description>
<market _id="0" type="A">
<price>45.95</price>
</market>
<market _id="0" type="B">
<price>55.95</price>
</market>
</book>
<book _id="1">
<author _id = "1">Ram</author>
<publish_date>02 Jul 1999</publish_date>
<description>Data Structure - Java</description>
<market _id="1" type="A">
<price>145.95</price>
</market>
<market _id="1" type="B">
<price>155.95</price>
</market>
</book>
</publisher>
以下代码正常运行
import scala.xml._
object XMLtoCSV extends App {
val xmlLoad = XML.loadFile("C:/Users/sharprao/Desktop/FirstTry.xml")
val price = (((xmlLoad \ "book" filter { _ \ "@_id" exists (_.text == "0")}) \ "market" filter { _ \ "@_id" exists (_.text == "0")}) \ "price").text //45.95
val price1 = (((xmlLoad \ "book" filter { _ \ "@_id" exists (_.text == "1")}) \ "market" filter { _ \ "@_id" exists (_.text == "1")}) \ "price").text //155.95
println("price = " + price)
println("price1 = " + price1)
}
输出结果为:
price = 45.9555.95
price1 = 145.95155.95
我上面的代码给了我两个值,因为我无法放入&amp;&amp;条件。
先谢谢。
答案 0 :(得分:2)
您可以编写自定义谓词来检查多个属性:
def checkMarket(marketId: String, marketType: String)(node: Node): Boolean = {
node.attribute("_id").exists(_.text == marketId) &&
node.attribute("type").exists(_.text == marketType)
}
然后将其用作过滤器:
val price1 = (((xmlLoad \ "book" filter (_ \ "@_id" exists (_.text == "0"))) \ "market" filter checkMarket("0", "A")) \ "price").text
// 45.95
val price2 = (((xmlLoad \ "book" filter (_ \ "@_id" exists (_.text == "1"))) \ "market" filter checkMarket("1", "B")) \ "price").text
// 155.95
答案 1 :(得分:1)
如果您有兴趣获取数据的CSV文件,那么这就是编写它的方法:
(xmlload \ "book").flatMap { bk =>
(bk \ "market").flatMap { mkt =>
(mkt \ "price").map { p =>
Seq(
bk \@ "_id",
mkt \@ "_id",
mkt \@ "type",
p.text.toFloat
)
}
}
}.map { cols =>
cols.mkString("\t")
}.foreach {
println
}
它将输出以下内容:
0 0 A 45.95
0 0 B 55.95
1 1 A 145.95
1 1 B 155.95
在编写Scala时要识别的常见模式:大多数flatMap
flatMap
... map
是否可以重写为for
- 理解:
for {
book <- xmlload \ "book"
market <- book \ "market"
price <- market \ "price"
} yield {
val cols = Seq(
book \@ "_id",
market \@ "_id",
market \@ "type",
price.text.toFloat
)
println(cols.mkString("\t"))
}
答案 2 :(得分:-1)
我使用Spark和hiveContext,我能够解析xPath。
object xPathReader extends App{
System.setProperty("hadoop.home.dir","D:\\IBM\\DB\\Hadoop\\winutils") // Path for my winutils.exe
val sparkConf = new SparkConf().setAppName("XMLParcing").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
val hiveContext = new HiveContext(sc)
val myXmlPath = "D:\\IBM\\DB\\xml"
val xmlRDDList = XmlFileUtil.withCharset(sc, myXmlPath, "UTF-8", "publisher") //XmlFileUtil - this is a private class in scala hence I created a Java class to use it.
import hiveContext.implicits._
val xmlDf = xmlRDDList.toDF("tempXMLTable")
xmlDf.registerTempTable("tempTable")
hiveContext.sql("select xpath_string(tempXMLTable,\"/book/@_id\") as BookId, xpath_float(tempXMLTable,\"/book/market[@_id='1' and @type='B']/price\") as Price from tempTable").show()
/* Output
+------+------+
|BookId| Price|
+------+------+
| 0| 55.95|
| 1|155.95|
+------+------+
*/
}