我的xml架构如下:
root
|-- generalinfo: string (nullable = true)
|-- info: string (nullable = true)
|-- info-info: string (nullable = true)
|-- metadata: struct (nullable = true)
| |-- element: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- files: array (nullable = true)
| |-- element: string (containsNull = true)
|-- parents: string (nullable = true)
|-- participants: array (nullable = true)
| |-- element: string (containsNull = true)
|-- signatures: array (nullable = true)
| |-- element: string (containsNull = true)
|-- size: string (nullable = true)
|-- system-path: string (nullable = true)
|-- satellite: string (nullable = true)
|-- event: array (nullable = true)
| |-- element: string (containsNull = true)
|-- user: string (nullable = true)
|-- version_id: string (nullable = true)
我已使用以下代码读取XML文件:
df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "product")
.option("attributePrefix", "_")
.load("datafiles/product.xml")
我的问题是所有标签都是空的。