我正在阅读Tom White的O'Reilly Hadoop书中的Hive教程。我正在尝试制作一个表格,但我无法让Hive创建存储桶。我可以创建表并将数据加载到其中,但所有数据都存储在一个文件中。
我正在运行伪分布式Hadoop集群。我正在使用Hadoop 1.2.1和Hive 0.10.0以及MySql Metastore。
数据(如下所示)最初位于“用户”表中。它们将放在一个有4个桶的桌子上,即每桶一个用户。
select * from users;
OK
id name
0 Nat
2 Joe
3 Kay
4 Ann
我设置下面的属性以尝试强制执行bucketing(我不认为显式设置mapred.reduce.tasks是必要的,但我包含它以防万一)。
set hive.enforce.bucketing=true;
set mapred.reduce.tasks=4;
然后我创建表'bucketed_users'并将数据加载到其中。
CREATE TABLE bucketed_users (id INT, name STRING)
CLUSTERED BY (id)
SORTED BY (id ASC) INTO 4 BUCKETS;
INSERT OVERWRITE TABLE bucketed_users SELECT * FROM users;
输出:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Execution log at: /tmp/katrina/katrina_20131003204949_a56048f5-ab2f-421b-af45-9ec3ff85731c.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-10-03 20:49:34,011 null map = 0%, reduce = 0%
2013-10-03 20:49:35,026 null map = 0%, reduce = 100%
Ended Job = job_local1250355097_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Loading data to table records.bucketed_users
Deleted hdfs://localhost/user/hive/warehouse/records/bucketed_users
Table records.bucketed_users stats: [num_partitions: 0, num_files: 1, num_rows: 4, total_size: 24, raw_data_size: 20]
OK
id name
Time taken: 8.527 seconds
数据已正确加载到'bucketed_users'(SELECT * FROM bucketed_users
显示所有用户)但创建的文件数量仅为1(num_files: 1
以上)而不是所需的4.查看bucketed_users HDFS中的目录(dfs -ls /user/hive/warehouse/records/bucketed_users;
)只显示一个文件000000_0。我该如何强制执行分组?
完整日志如下:
2013-10-03 20:49:30,769 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Execution log at: /tmp/katrina/katrina_20131003204949_a56048f5-ab2f-421b-af45-9ec3ff85731c.log
2013-10-03 20:49:31,139 INFO exec.ExecDriver (ExecDriver.java:execute(328)) - Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
2013-10-03 20:49:31,144 INFO exec.ExecDriver (ExecDriver.java:execute(350)) - adding libjars: file:///Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar
2013-10-03 20:49:31,144 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(852)) - Processing alias users
2013-10-03 20:49:31,145 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(870)) - Adding input file hdfs://localhost/user/hive/warehouse/records/users
2013-10-03 20:49:31,145 INFO exec.Utilities (Utilities.java:isEmptyPath(1900)) - Content Summary not cached for hdfs://localhost/user/hive/warehouse/records/users
2013-10-03 20:49:31,365 WARN util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(52)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-10-03 20:49:32,410 INFO exec.ExecDriver (ExecDriver.java:createTmpDirs(219)) - Making Temp Directory: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/-ext-10000
2013-10-03 20:49:32,420 WARN mapred.JobClient (JobClient.java:copyAndConfigureFiles(746)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2013-10-03 20:49:32,648 WARN snappy.LoadSnappy (LoadSnappy.java:<clinit>(46)) - Snappy native library not loaded
2013-10-03 20:49:32,655 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for hdfs://localhost/user/hive/warehouse/records/users; using filter path hdfs://localhost/user/hive/warehouse/records/users
2013-10-03 20:49:32,661 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(199)) - Total input paths to process : 1
2013-10-03 20:49:32,716 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(411)) - number of splits 1
2013-10-03 20:49:32,847 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:downloadCacheObject(423)) - Creating hive-builtins-0.10.0.jar in /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar-work--7485859847513724632 with rwxr-xr-x
2013-10-03 20:49:32,850 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:downloadCacheObject(435)) - Extracting /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar-work--7485859847513724632/hive-builtins-0.10.0.jar to /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar-work--7485859847513724632
2013-10-03 20:49:32,870 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:downloadCacheObject(463)) - Cached file:///Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar as /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar
2013-10-03 20:49:32,880 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:localizePublicCacheObject(486)) - Cached file:///Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar as /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar
2013-10-03 20:49:32,987 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Job running in-process (local Hadoop)
2013-10-03 20:49:33,034 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(340)) - Waiting for map tasks
2013-10-03 20:49:33,037 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(204)) - Starting task: attempt_local1250355097_0001_m_000000_0
2013-10-03 20:49:33,073 INFO mapred.Task (Task.java:initialize(534)) - Using ResourceCalculatorPlugin : null
2013-10-03 20:49:33,077 INFO mapred.MapTask (MapTask.java:updateJobWithSplit(455)) - Processing split: Paths:/user/hive/warehouse/records/users/users.txt:0+24InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
2013-10-03 20:49:33,093 INFO io.HiveContextAwareRecordReader (HiveContextAwareRecordReader.java:initIOContext(154)) - Processing file hdfs://localhost/user/hive/warehouse/records/users/users.txt
2013-10-03 20:49:33,093 INFO mapred.MapTask (MapTask.java:runOldMapper(419)) - numReduceTasks: 1
2013-10-03 20:49:33,099 INFO mapred.MapTask (MapTask.java:<init>(949)) - io.sort.mb = 100
2013-10-03 20:49:33,541 INFO mapred.MapTask (MapTask.java:<init>(961)) - data buffer = 79691776/99614720
2013-10-03 20:49:33,542 INFO mapred.MapTask (MapTask.java:<init>(962)) - record buffer = 262144/327680
2013-10-03 20:49:33,550 INFO ExecMapper (ExecMapper.java:configure(69)) - maximum memory = 2088435712
2013-10-03 20:49:33,551 INFO ExecMapper (ExecMapper.java:configure(74)) - conf classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/]
2013-10-03 20:49:33,551 INFO ExecMapper (ExecMapper.java:configure(76)) - thread classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/]
2013-10-03 20:49:33,585 INFO exec.MapOperator (MapOperator.java:setChildren(387)) - Adding alias users to work list for file hdfs://localhost/user/hive/warehouse/records/users
2013-10-03 20:49:33,587 INFO exec.MapOperator (MapOperator.java:setChildren(402)) - dump TS struct<id:int,name:string>
2013-10-03 20:49:33,588 INFO ExecMapper (ExecMapper.java:configure(91)) -
<MAP>Id =10
<Children>
<TS>Id =0
<Children>
<SEL>Id =1
<Children>
<RS>Id =2
<Parent>Id = 1 null<\Parent>
<\RS>
<\Children>
<Parent>Id = 0 null<\Parent>
<\SEL>
<\Children>
<Parent>Id = 10 null<\Parent>
<\TS>
<\Children>
<\MAP>
2013-10-03 20:49:33,588 INFO exec.MapOperator (Operator.java:initialize(321)) - Initializing Self 10 MAP
2013-10-03 20:49:33,588 INFO exec.TableScanOperator (Operator.java:initialize(321)) - Initializing Self 0 TS
2013-10-03 20:49:33,588 INFO exec.TableScanOperator (Operator.java:initializeChildren(386)) - Operator 0 TS initialized
2013-10-03 20:49:33,589 INFO exec.TableScanOperator (Operator.java:initializeChildren(390)) - Initializing children of 0 TS
2013-10-03 20:49:33,589 INFO exec.SelectOperator (Operator.java:initialize(425)) - Initializing child 1 SEL
2013-10-03 20:49:33,589 INFO exec.SelectOperator (Operator.java:initialize(321)) - Initializing Self 1 SEL
2013-10-03 20:49:33,592 INFO exec.SelectOperator (SelectOperator.java:initializeOp(58)) - SELECT struct<id:int,name:string>
2013-10-03 20:49:33,594 INFO exec.SelectOperator (Operator.java:initializeChildren(386)) - Operator 1 SEL initialized
2013-10-03 20:49:33,595 INFO exec.SelectOperator (Operator.java:initializeChildren(390)) - Initializing children of 1 SEL
2013-10-03 20:49:33,595 INFO exec.ReduceSinkOperator (Operator.java:initialize(425)) - Initializing child 2 RS
2013-10-03 20:49:33,595 INFO exec.ReduceSinkOperator (Operator.java:initialize(321)) - Initializing Self 2 RS
2013-10-03 20:49:33,595 INFO exec.ReduceSinkOperator (ReduceSinkOperator.java:initializeOp(112)) - Using tag = -1
2013-10-03 20:49:33,606 INFO exec.ReduceSinkOperator (Operator.java:initializeChildren(386)) - Operator 2 RS initialized
2013-10-03 20:49:33,606 INFO exec.ReduceSinkOperator (Operator.java:initialize(361)) - Initialization Done 2 RS
2013-10-03 20:49:33,606 INFO exec.SelectOperator (Operator.java:initialize(361)) - Initialization Done 1 SEL
2013-10-03 20:49:33,606 INFO exec.TableScanOperator (Operator.java:initialize(361)) - Initialization Done 0 TS
2013-10-03 20:49:33,607 INFO exec.MapOperator (Operator.java:initialize(361)) - Initialization Done 10 MAP
2013-10-03 20:49:33,637 INFO exec.MapOperator (MapOperator.java:cleanUpInputFileChangedOp(494)) - Processing alias users for file hdfs://localhost/user/hive/warehouse/records/users
2013-10-03 20:49:33,638 INFO exec.MapOperator (Operator.java:forward(774)) - 10 forwarding 1 rows
2013-10-03 20:49:33,638 INFO exec.TableScanOperator (Operator.java:forward(774)) - 0 forwarding 1 rows
2013-10-03 20:49:33,639 INFO exec.SelectOperator (Operator.java:forward(774)) - 1 forwarding 1 rows
2013-10-03 20:49:33,641 INFO ExecMapper (ExecMapper.java:map(148)) - ExecMapper: processing 1 rows: used memory = 114294872
2013-10-03 20:49:33,642 INFO exec.MapOperator (Operator.java:close(549)) - 10 finished. closing...
2013-10-03 20:49:33,643 INFO exec.MapOperator (Operator.java:close(555)) - 10 forwarded 4 rows
2013-10-03 20:49:33,643 INFO exec.MapOperator (Operator.java:logStats(845)) - DESERIALIZE_ERRORS:0
2013-10-03 20:49:33,643 INFO exec.TableScanOperator (Operator.java:close(549)) - 0 finished. closing...
2013-10-03 20:49:33,643 INFO exec.TableScanOperator (Operator.java:close(555)) - 0 forwarded 4 rows
2013-10-03 20:49:33,643 INFO exec.SelectOperator (Operator.java:close(549)) - 1 finished. closing...
2013-10-03 20:49:33,644 INFO exec.SelectOperator (Operator.java:close(555)) - 1 forwarded 4 rows
2013-10-03 20:49:33,644 INFO exec.ReduceSinkOperator (Operator.java:close(549)) - 2 finished. closing...
2013-10-03 20:49:33,644 INFO exec.ReduceSinkOperator (Operator.java:close(555)) - 2 forwarded 0 rows
2013-10-03 20:49:33,644 INFO exec.SelectOperator (Operator.java:close(570)) - 1 Close done
2013-10-03 20:49:33,644 INFO exec.TableScanOperator (Operator.java:close(570)) - 0 Close done
2013-10-03 20:49:33,644 INFO exec.MapOperator (Operator.java:close(570)) - 10 Close done
2013-10-03 20:49:33,645 INFO ExecMapper (ExecMapper.java:close(215)) - ExecMapper: processed 4 rows: used memory = 114767288
2013-10-03 20:49:33,647 INFO mapred.MapTask (MapTask.java:flush(1289)) - Starting flush of map output
2013-10-03 20:49:33,659 INFO mapred.MapTask (MapTask.java:sortAndSpill(1471)) - Finished spill 0
2013-10-03 20:49:33,661 INFO mapred.Task (Task.java:done(858)) - Task:attempt_local1250355097_0001_m_000000_0 is done. And is in the process of commiting
2013-10-03 20:49:33,668 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) - hdfs://localhost/user/hive/warehouse/records/users/users.txt:0+24
2013-10-03 20:49:33,668 INFO mapred.Task (Task.java:sendDone(970)) - Task 'attempt_local1250355097_0001_m_000000_0' done.
2013-10-03 20:49:33,668 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(229)) - Finishing task: attempt_local1250355097_0001_m_000000_0
2013-10-03 20:49:33,668 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(348)) - Map task executor complete.
2013-10-03 20:49:33,680 INFO mapred.Task (Task.java:initialize(534)) - Using ResourceCalculatorPlugin : null
2013-10-03 20:49:33,680 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) -
2013-10-03 20:49:33,690 INFO mapred.Merger (Merger.java:merge(408)) - Merging 1 sorted segments
2013-10-03 20:49:33,695 INFO mapred.Merger (Merger.java:merge(491)) - Down to the last merge-pass, with 1 segments left of total size: 70 bytes
2013-10-03 20:49:33,695 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) -
2013-10-03 20:49:33,697 INFO ExecReducer (ExecReducer.java:configure(100)) - maximum memory = 2088435712
2013-10-03 20:49:33,697 INFO ExecReducer (ExecReducer.java:configure(105)) - conf classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/]
2013-10-03 20:49:33,697 INFO ExecReducer (ExecReducer.java:configure(107)) - thread classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/]
2013-10-03 20:49:33,698 INFO ExecReducer (ExecReducer.java:configure(149)) -
<OP>Id =3
<Children>
<FS>Id =4
<Parent>Id = 3 null<\Parent>
<\FS>
<\Children>
<\OP>
2013-10-03 20:49:33,698 INFO exec.ExtractOperator (Operator.java:initialize(321)) - Initializing Self 3 OP
2013-10-03 20:49:33,698 INFO exec.ExtractOperator (Operator.java:initializeChildren(386)) - Operator 3 OP initialized
2013-10-03 20:49:33,698 INFO exec.ExtractOperator (Operator.java:initializeChildren(390)) - Initializing children of 3 OP
2013-10-03 20:49:33,698 INFO exec.FileSinkOperator (Operator.java:initialize(425)) - Initializing child 4 FS
2013-10-03 20:49:33,699 INFO exec.FileSinkOperator (Operator.java:initialize(321)) - Initializing Self 4 FS
2013-10-03 20:49:33,701 INFO exec.FileSinkOperator (Operator.java:initializeChildren(386)) - Operator 4 FS initialized
2013-10-03 20:49:33,701 INFO exec.FileSinkOperator (Operator.java:initialize(361)) - Initialization Done 4 FS
2013-10-03 20:49:33,701 INFO exec.ExtractOperator (Operator.java:initialize(361)) - Initialization Done 3 OP
2013-10-03 20:49:33,706 INFO ExecReducer (ExecReducer.java:reduce(243)) - ExecReducer: processing 1 rows: used memory = 117749816
2013-10-03 20:49:33,707 INFO exec.ExtractOperator (Operator.java:forward(774)) - 3 forwarding 1 rows
2013-10-03 20:49:33,707 INFO exec.FileSinkOperator (FileSinkOperator.java:createBucketFiles(458)) - Final Path: FS hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000/000000_0
2013-10-03 20:49:33,707 INFO exec.FileSinkOperator (FileSinkOperator.java:createBucketFiles(460)) - Writing to temp file: FS hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_task_tmp.-ext-10000/_tmp.000000_0
2013-10-03 20:49:33,707 INFO exec.FileSinkOperator (FileSinkOperator.java:createBucketFiles(481)) - New Final Path: FS hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000/000000_0
2013-10-03 20:49:33,737 INFO ExecReducer (ExecReducer.java:close(301)) - ExecReducer: processed 4 rows: used memory = 118477400
2013-10-03 20:49:33,737 INFO exec.ExtractOperator (Operator.java:close(549)) - 3 finished. closing...
2013-10-03 20:49:33,737 INFO exec.ExtractOperator (Operator.java:close(555)) - 3 forwarded 4 rows
2013-10-03 20:49:33,737 INFO exec.FileSinkOperator (Operator.java:close(549)) - 4 finished. closing...
2013-10-03 20:49:33,737 INFO exec.FileSinkOperator (Operator.java:close(555)) - 4 forwarded 0 rows
2013-10-03 20:49:33,990 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-10-03 20:49:34,011 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - 2013-10-03 20:49:34,011 null map = 0%, reduce = 0%
2013-10-03 20:49:34,111 INFO jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(137)) - Stats publishing for key hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/-ext-10000/000000
2013-10-03 20:49:34,143 INFO exec.FileSinkOperator (Operator.java:logStats(845)) - TABLE_ID_1_ROWCOUNT:4
2013-10-03 20:49:34,143 INFO exec.ExtractOperator (Operator.java:close(570)) - 3 Close done
2013-10-03 20:49:34,145 INFO mapred.Task (Task.java:done(858)) - Task:attempt_local1250355097_0001_r_000000_0 is done. And is in the process of commiting
2013-10-03 20:49:34,146 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) - reduce > reduce
2013-10-03 20:49:34,147 INFO mapred.Task (Task.java:sendDone(970)) - Task 'attempt_local1250355097_0001_r_000000_0' done.
2013-10-03 20:49:35,026 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - 2013-10-03 20:49:35,026 null map = 0%, reduce = 100%
2013-10-03 20:49:35,030 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Ended Job = job_local1250355097_0001
2013-10-03 20:49:35,033 INFO exec.FileSinkOperator (Utilities.java:mvFileToFinalPath(1361)) - Moving tmp dir: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000 to: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000.intermediate
2013-10-03 20:49:35,036 INFO exec.FileSinkOperator (Utilities.java:mvFileToFinalPath(1372)) - Moving tmp dir: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000.intermediate to: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/-ext-10000
答案 0 :(得分:0)
我无法重现:
hive> INSERT OVERWRITE TABLE bucketed_users SELECT * FROM unbucketed_users;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1384565454792_0070, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1384565454792_0070/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1384565454792_0070
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4
2013-11-16 05:04:12,290 Stage-1 map = 0%, reduce = 0%
2013-11-16 05:04:33,868 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.16 sec
MapReduce Total cumulative CPU time: 7 seconds 160 msec
Ended Job = job_1384565454792_0070
Loading data to table default.bucketed_users
rmr: DEPRECATED: Please use 'rm -r' instead.
Moved: 'hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/bucketed_users' to trash at: hdfs://sandbox.hortonworks.com:8020/user/hue/.Trash/Current
Table default.bucketed_users stats: [num_partitions: 0, num_files: 4, num_rows: 0, total_size: 24, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 4 Cumulative CPU: 7.16 sec HDFS Read: 259 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 160 msec
OK
Time taken: 19.291 seconds
hive> dfs -ls /apps/hive/warehouse/bucketed_users;
Found 4 items
-rw-r--r-- 3 hue hdfs 12 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000000_0
-rw-r--r-- 3 hue hdfs 0 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000001_0
-rw-r--r-- 3 hue hdfs 6 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000002_0
-rw-r--r-- 3 hue hdfs 6 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000003_0
你看到转换为MapJoin是很奇怪的,你不应该看到它,因为你的查询没有连接。这真的是你正在运行的查询吗?如果您看到我建议:
hive.auto.convert.join=false;
如果修复它,你应该提交一个错误。
答案 1 :(得分:0)
奇怪,这对我有用,但是既然你指定你的表被排序,你还需要设置
set hive.enforce.sorting=true;
in addition of
set hive.enforce.bucketing = true;
我想知道桶/排序表的组合是否只设置强制设置之一是否会以某种方式混淆它。