我创建了一个外部但分区的表格,如下所示
创建外部表如果不存在股息(ymd STRING,股息 FLOAT)划分(交换STRING,符号STRING)行格式 由',';
终止的删除字段
我希望以这样一种方式加载数据:对于每个唯一的分区值,它会自动形成一个新的分区,数据会进入。这有什么办法吗?
以下示例数据
NASDAQ,AMTD,2006-01-25,6.0
NASDAQ,AHGP,2009-11-09,0.44
NASDAQ,AHGP,2009-08-10,0.428
NASDAQ,AHGP,2009-05-11,0.415
NASDAQ,AHGP,2009-02-10,0.403
NASDAQ,AHGP,2008-11-07,0.39
NASDAQ,AHGP,2008-08-08,0.353
NASDAQ,AHGP,2008-05-09,0.288
NASDAQ,AHGP,2008-02-08,0.288
NASDAQ,AHGP,2007-11-07,0.265
NASDAQ,AHGP,2007-08-08,0.265
NASDAQ,AHGP,2007-05-09,0.25
NASDAQ,AHGP,2007-02-07,0.25
NASDAQ,AHGP,2006-11-07,0.215
NASDAQ,AHGP,2006-08-09,0.215
NASDAQ,ALEX,2009-11-03,0.315
NASDAQ,ALEX,2009-08-04,0.315
NASDAQ,ALEX,2009-05-12,0.315
NASDAQ,ALEX,2009-02-11,0.315
NASDAQ,ALEX,2008-11-04,0.315
NASDAQ,AFCE,2005-06-06,12.0
NASDAQ,ASRVP,2009-12-28,0.528
NASDAQ,ASRVP,2009-09-25,0.528
NASDAQ,ASRVP,2009-06-25,0.528
NASDAQ,ASRVP,2009-03-26,0.528
NASDAQ,ASRVP,2008-12-26,0.528
NASDAQ,ASRVP,2008-09-25,0.528
NASDAQ,ASRVP,2008-06-25,0.528
答案 0 :(得分:0)
我在寻找这个。这些是我的步骤,创建了一个Staging表并加载了csv文件,然后使用动态分区创建并加载了表。
创建外部表库存(exchange
STRING,
symbol
STRING,
ymd
STRING,
price_open
FLOAT,
price_high
FLOAT,
price_low
FLOAT,
price_close
FLOAT,
volume
INT,
price_adj_close
FLOAT)
LOCATION' / user / hduser / stocks';
创建外部表如果不存在dividends_stage(
exchange
STRING,
symbol
STRING,
ymd
STRING,
dividend
FLOAT)
ROW FORMAT DELIMITED FIELDS终止于','
LOCATION' / user / hduser / div_stage';
hadoop fs -mv /user/hduser/dividends.csv / user / hduser / div_stage
创建外部表,如果不存在股息(
ymd
STRING,
dividend
FLOAT)
分段(exchange
STRING,symbol
STRING)
ROW FORMAT DELIMITED FIELDS终止于',' ;
INSERT OVERWRITE TABLE分红分区(exchange
,symbol
)
从dividends_stage中选择ymd
,dividend
,exchange
,symbol
;
SELECT INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE来自股息;
希望这有帮助,不要太晚......