Question

尝试创建一个Hive表，但由于文件夹结构，它只需要几个小时才能进行分区。

以下是我目前用于创建表的示例，但如果我可以过滤分区，那将非常有用。

在下面我需要每个child_company，每个月只需一年，只需要一种类型的报告。

有没有办法做set hcat.dynamic.partitioning.custom.pattern = '${child_company}/year=${2016}/${month}/report=${inventory}';这样的事情？分区时要避免需要通读所有文件夹（＆gt; 300k）？

Language: Hive

Version: 1.2

Interface: Quobole

use my_database;

set hcat.dynamic.partitioning.custom.pattern = '${child_company}/${year}/${month}/${report}';

drop table if exists table_1;

create external table table_1
(
    Date_Date    string,
    Product   string,
    Quantity    int,
    Cost  int
)
partitioned by
(
child_company string,
year int,
month int,
report string
)

row format delimited fields terminated by '\t'
lines terminated by '\n'
location 's3://mycompany-myreports/parent/partner_company-12345';

alter table table_1 recover partitions;
show partitions table_1;

在Apache Hive中过滤动态分区

0 个答案: