Question

我正在使用

将我的日志从S3加载到Hive中

 CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ;

我的数据看起来像这样

{
  "col1": {
    "country": "India",
    "page": "/signup",
    "date": "2018-01-01"
  }
}

如果我想在col1.country，col1.page，col1.date上创建分区我应该如何在创建语句中包含它，我尝试了colName.fieldName，但没有成功。

Answer 1

您可以直接尝试而不提及列名称，如下所示

 CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
partitioned by (country string, page string, date string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ;

请注意，外部表不会直接检测分区，您必须更改和添加分区，如下所示：

ALTER TABLE logs ADD PARTITION (country=india, pager=whatever, date=whatever) location '/hdfs/path/';

#You might also need to repair the table at the end

msck repair table schemaName.tableName

如何使用嵌套数据向hive表添加分区？

1 个答案: