如何自动加载分区表中的数据

时间:2016-04-12 09:01:30

标签: hive

我创建了一个外部但分区的表格,如下所示

  

创建外部表如果不存在股息(ymd STRING,股息   FLOAT)划分(交换STRING,符号STRING)行格式   由',';

终止的删除字段

我希望以这样一种方式加载数据:对于每个唯一的分区值,它会自动形成一个新的分区,数据会进入。这有什么办法吗?

以下示例数据

NASDAQ,AMTD,2006-01-25,6.0
NASDAQ,AHGP,2009-11-09,0.44
NASDAQ,AHGP,2009-08-10,0.428
NASDAQ,AHGP,2009-05-11,0.415
NASDAQ,AHGP,2009-02-10,0.403
NASDAQ,AHGP,2008-11-07,0.39
NASDAQ,AHGP,2008-08-08,0.353
NASDAQ,AHGP,2008-05-09,0.288
NASDAQ,AHGP,2008-02-08,0.288
NASDAQ,AHGP,2007-11-07,0.265
NASDAQ,AHGP,2007-08-08,0.265
NASDAQ,AHGP,2007-05-09,0.25
NASDAQ,AHGP,2007-02-07,0.25
NASDAQ,AHGP,2006-11-07,0.215
NASDAQ,AHGP,2006-08-09,0.215
NASDAQ,ALEX,2009-11-03,0.315
NASDAQ,ALEX,2009-08-04,0.315
NASDAQ,ALEX,2009-05-12,0.315
NASDAQ,ALEX,2009-02-11,0.315
NASDAQ,ALEX,2008-11-04,0.315
NASDAQ,AFCE,2005-06-06,12.0
NASDAQ,ASRVP,2009-12-28,0.528
NASDAQ,ASRVP,2009-09-25,0.528
NASDAQ,ASRVP,2009-06-25,0.528
NASDAQ,ASRVP,2009-03-26,0.528
NASDAQ,ASRVP,2008-12-26,0.528
NASDAQ,ASRVP,2008-09-25,0.528
NASDAQ,ASRVP,2008-06-25,0.528

1 个答案:

答案 0 :(得分:0)

我在寻找这个。这些是我的步骤,创建了一个Staging表并加载了csv文件,然后使用动态分区创建并加载了表。

创建外部表库存(exchange STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT, price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT) LOCATION' / user / hduser / stocks';

创建外部表如果不存在dividends_stage(
exchange STRING, symbol STRING, ymd STRING,
dividend FLOAT) ROW FORMAT DELIMITED FIELDS终止于',' LOCATION' / user / hduser / div_stage';

hadoop fs -mv /user/hduser/dividends.csv / user / hduser / div_stage

创建外部表,如果不存在股息(
ymd STRING,
dividend FLOAT) 分段(exchange STRING,symbol STRING) ROW FORMAT DELIMITED FIELDS终止于',' ;

INSERT OVERWRITE TABLE分红分区(exchangesymbol) 从dividends_stage中选择ymddividendexchangesymbol;

SELECT INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE来自股息;

希望这有帮助,不要太晚......