Question

我正在尝试使用以下数据格式在火花作业中创建一个Hive表：

这个spark数据帧架构是：

{'Group1': {[start=0, end=20]: 'Data goes here'}}

显示为：

MapType(StringType(),
        MapType(StructType([
                StructField('start', IntegerType(), False),
                StructField('end', IntegerType(), False)]),
                StringType()))

这似乎在spark中运行得很好但是当我尝试从这个模式创建一个hive表时：

root
 |-- column_1: map (nullable = true)
 |    |-- key: string
 |    |-- value: map (valueContainsNull = true)
 |    |    |-- key: struct
 |    |    |-- value: string (valueContainsNull = true)
 |    |    |    |-- start: integer (nullable = true)
 |    |    |    |-- end: integer (nullable = true)

我明白了：

CREATE EXTERNAL TABLE test_table (
column_1 MAP<STRING, MAP<STRUCT<`start`:BIGINT,`end`:BIGINT>, STRING>>
)
STORED AS PARQUET
LOCATION 'path_to_files';

就我所知，它看起来像合法的桌子结构。我找不到任何告诉我你不能将FAILED: ParseException cannot recognize input near 'STRUCT' '<' 'start' in primitive type specification作为hive 2.0的struct中的关键字，而火花2.0处理它就好了。

Answer 1

在Hive中，Map列的键必须是基元（即不是Struct）。

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes

我强烈建议您不要将密钥作为结构。在您的示例中，如果我不知道开头或结尾，如何访问地图的值？用户需要知道确切的开始和结束，并对表中的每一行进行更改吗？

Hive：在创建表时，struct作为地图类型的键

1 个答案: