我发现很难将镶木地板文件加载到hive表中。我正在使用Amazon EMR集群和数据处理的火花。但我需要读取输出镶木地板文件以验证我的转换。我有以下架构的镶木地板文件:
root
|-- ATTR_YEAR: long (nullable = true)
|-- afil: struct (nullable = true)
| |-- clm: struct (nullable = true)
| | |-- amb: struct (nullable = true)
| | | |-- L: string (nullable = true)
| | | |-- cdTransRsn: string (nullable = true)
| | | |-- dist: struct (nullable = true)
| | | | |-- T: string (nullable = true)
| | | | |-- content: double (nullable = true)
| | | |-- dscStrchPurp: string (nullable = true)
| | |-- amt: struct (nullable = true)
| | | |-- L: string (nullable = true)
| | | |-- T: string (nullable = true)
| | | |-- content: double (nullable = true)
| | |-- amtTotChrg: double (nullable = true)
| | |-- cdAccState: string (nullable = true)
| | |-- cdCause: string (nullable = true)
如何使用这种类型的模式创建hive外部表并将镶木地板文件加载到该hive表中进行分析?
答案 0 :(得分:0)
您可以使用Catalog.createExternalTable
(2.2之前的Spark)或Catalog.createTable
(Spark 2.2及更高版本)。
Catalog
:访问 SparkSession
个实例
val spark: SparkSession
spark.catalog.createTable(...)
应在启用Hive支持的情况下初始化会话。