Spark SQL XML:如何访问标记中的属性

时间:2016-11-29 09:38:57

标签: xml scala apache-spark-sql

我的xml架构如下:

root
 |-- generalinfo: string (nullable = true)
 |-- info: string (nullable = true)
 |-- info-info: string (nullable = true)
 |-- metadata: struct (nullable = true)
 |    |-- element: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |-- files: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- parents: string (nullable = true)
 |-- participants: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- signatures: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- size: string (nullable = true)
 |-- system-path: string (nullable = true)
 |-- satellite: string (nullable = true)
 |-- event: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- user: string (nullable = true)
 |-- version_id: string (nullable = true)

我已使用以下代码读取XML文件:

 df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "product")
.option("attributePrefix", "_")
.load("datafiles/product.xml")

我的问题是所有标签都是空的。

0 个答案:

没有答案