Question

当我在Hive CLI上运行“Create Table as Select”查询时，表已创建但数据未填充。但是当我在Hive Beeswax上运行相同的查询时，我得到的目标表是用填充的数据创建的。

以下是查询：

    hive -e '
    create table table_validation as

    select listing_id, city, area, expected_amount_inr, property_id, house_type, case when area_builtup_sqft 

    is NULL or 

    area_builtup_sqft = 0 or area_builtup_sqft = " " then plot_area else area_builtup_sqft end as area_sqft, 

    case when area_builtup_sqft is NULL or area_builtup_sqft = 0 or area_builtup_sqft = " " 

    then expected_amount_inr/plot_area else expected_amount_inr/area_builtup_sqft end as 

    price_sqft,listing_state,

    case when house_type like "apartment" then "apartment" when house_type like "plot" then "plot" else 

    "others" end as property_type, case when house_type like "plot" then "NA" when num_bedrooms between 1 and 1.9 then 1 when num_bedrooms between 
    2 and 2.9 then 2 when num_bedrooms between 3 and 3.9 then 3 when num_bedrooms >= 4 then 4 else num_bedrooms end as number_bedrooms

    from realestate_listing_main 

    where listing_type LIKE "rent"

    and added_on between '2015-02-01' and '2015-03-31'
' --database default;

当我运行此查询时，我得到以下结果：

  running hive query
  0 2015-03-31 18:40:41,025 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
    2015-03-31 18:40:41,030 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
    2015-03-31 18:40:41,030 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
    2015-03-31 18:40:41,030 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
    2015-03-31 18:40:41,031 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
    2015-03-31 18:40:41,031 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    2015-03-31 18:40:41,031 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
    2015-03-31 18:40:41,336 WARN  [main] conf.HiveConf (HiveConf.java:initialize(1155)) - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

    Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.12.0-cdh5.1.2.jar!/hive-log4j.properties
    OK
    Time taken: 0.621 seconds
    Total MapReduce jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1427789583342_0014, Tracking URL = http://ip-10-172-133-249.ap-southeast-1.compute.internal:8088/proxy/application_1427789583342_0014/
    Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1427789583342_0014
    Hadoop job information for Stage-1: number of mappers: 10; number of reducers: 0
    2015-03-31 18:40:59,849 Stage-1 map = 0%,  reduce = 0%
    2015-03-31 18:41:10,188 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 5.86 sec
    2015-03-31 18:41:11,219 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 5.86 sec
    2015-03-31 18:41:12,252 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 5.86 sec
    2015-03-31 18:41:13,289 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 5.86 sec
    2015-03-31 18:41:14,321 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 5.86 sec
    2015-03-31 18:41:15,357 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 5.86 sec
    2015-03-31 18:41:16,393 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 39.78 sec
    2015-03-31 18:41:17,428 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 41.17 sec
    2015-03-31 18:41:18,460 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 43.26 sec
    2015-03-31 18:41:19,499 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 49.68 sec
    2015-03-31 18:41:20,536 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 50.49 sec
    2015-03-31 18:41:21,569 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:22,598 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:23,627 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:24,655 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:25,684 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:26,714 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:27,743 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:28,773 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 56.28 sec
    2015-03-31 18:41:29,803 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 61.88 sec
    2015-03-31 18:41:30,840 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 63.8 sec
    2015-03-31 18:41:31,872 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 63.8 sec
    2015-03-31 18:41:32,905 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 69.86 sec
    2015-03-31 18:41:33,935 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 71.58 sec
    2015-03-31 18:41:34,964 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 71.58 sec
    MapReduce Total cumulative CPU time: 1 minutes 11 seconds 580 msec
    Ended Job = job_1427789583342_0014
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to: hdfs://ip-10-172-133-249.ap-southeast-1.compute.internal:8020/tmp/hive-root/hive_2015-03-31_18-40-42_689_38529489390850959-1/-ext-10001
    Moving data to: hdfs://ip-10-172-133-249.ap-southeast-1.compute.internal:8020/user/hive/warehouse/default.db/table_validation
    Table default.table_validation stats: [num_partitions: 0, num_files: 10, num_rows: 0, total_size: 0, raw_data_size: 0]
    MapReduce Jobs Launched: 
    Job 0: Map: 10   Cumulative CPU: 71.58 sec   HDFS Read: 2635527679 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 11 seconds 580 msec
    OK
    Time taken: 52.896 seconds

它没有执行第二和第三份工作。但是，当我在hive beeswax上运行查询时，所有作业都将被执行，并且表将使用数据创建。

请让我知道我错过了什么？从过去3天开始，我就被困在了这个问题上。

Answer 1

得到了答案。需要在运行查询之前添加serde.jar，因为如果没有此jar，hive无法识别数据。

当Hive Beeswax工作正常时，Hive CLI不会填充表数据（从Create Table作为Select Query）

1 个答案: