数据从蜂巢中的一个表重新加载到另一个表

时间:2014-07-10 12:07:30

标签: hadoop mapreduce hive

我正在将数据从一个表加载到另一个表中,而新表的新属性与原始表不同。

加载时我面临以下问题...有任何帮助来解决这个问题吗?

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"mdse_item_i":671841,"co_loc_i":146,"persh_expr_d":"2014-05-01","greg_d":"2013-06-17","persh_oh_q":16.0,"crte_btch_i":765,"updt_btch_i":765,"range_n":"ITEM_LOC_DAY_PERSH_OH_INV_2013-04-01_2013-07-31"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"mdse_item_i":671841,"

我的旧表defntn:

hive> describe nonclickstream.ITEM_LOC_DAY_PERSH_OH_INV;
OK
mdse_item_i     int
co_loc_i        int
persh_expr_d    string
greg_d  string
persh_oh_q      double
crte_btch_i     int
updt_btch_i     int
range_n string

所用时间:0.058秒

我的新表def。如下:

hive> describe ITEM_LOC_DAY_PERSH_OH_INV;
OK
mdse_item_i     int     from deserializer
co_loc_i        int     from deserializer
persh_expr_d    string  from deserializer
greg_d  string  from deserializer
persh_oh_q      string  from deserializer
crte_btch_i     int     from deserializer
updt_btch_i     int     from deserializer
greg_date       string
Time taken: 0.241 seconds

新的是使用avro架构创建的。

CREATE external TABLE ITEM_LOC_DAY_PERSH_OH_INV
partitioned by (greg_date string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
Location '/common/TD/INV_new/ITEM_LOC_DAY_PERSH_OH_INV/'
TBLPROPERTIES (
'avro.schema.url'='hdfs:///common/TD/INV_new/ITEM_LOC_DAY_PERSH_OH_INV/ITEM_LOC_DAY_PERSH_OH_INV.avs');

我们正在使用的加载命令:

INSERT INTO TABLE ITEM_LOC_DAY_PERSH_OH_INV PARTITION (greg_date)
SELECT 
mdse_item_i,
co_loc_i,
persh_expr_d,
greg_d,
persh_oh_q,
crte_btch_i,
updt_btch_i,
greg_d FROM nonclickstream.ITEM_LOC_DAY_PERSH_OH_INV where range_n='ITEM_LOC_DAY_PERSH_OH_INV_2013-04-01_2013-07-31';

我们在加载时使用动态分区!

实际上我们要做的是用另一列重新分区表。同时修改了架构。

同样的方法适用于其他表......但是只有这个表我们才面临这个问题......

0 个答案:

没有答案