我正在尝试在Cloudera的Hive中创建一个bucketed表。但是,创建一个没有任何存储桶的普通表。
首先,我使用Hive CLI
创建了一个名为marks_temp的普通表CREATE TABLE marks_temp(
id INT,
Name string,
mark int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
我已将以下数据从“Desktop / Data / littlebigdata.txt”文本文件加载到marks_temp表中
101,Firdaus,88
102,Pranav,78
103,Rahul,65
104,Sanjoy,65
105,Firdaus,88
106,Pranav,78
107,Rahul,65
108,Sanjoy,65
109,Amar,54
110,Sahil,34
111,Rahul,45
112,Rajnish,67
113,Ranjeet,56
114,Sanjoy,34
我已使用以下命令
加载了以上数据LOAD DATA LOCAL INPATH 'Desktop/Data/littlebigdata.txt'
INTO TABLE marks_temp;
成功加载数据后,我正在创建一个名为marks_temp
的分段表CREATE TABLE marks_bucketed(
id INT,
Name string,
mark int
)
CLUSTERED BY (id) INTO 4 BUCKETS;
现在,我在marks_temp table中的marks_bucketed表中插入数据。
INSERT INTO marks_bucketed
SELECT id,Name, mark FROM marks_temp;
在此之后,一些工作开始运行。什么,我在作业日志中观察到它说“减少任务的数量设置为0,因为没有减少运算符”
hive> insert into marks_bucketed
> select id,Name,mark from marks_temp;
Query ID = cloudera_20180601035353_29b25ffe-541e-491e-aea6-b36ede88ed79
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1527668582032_0004, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1527668582032_0004/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1527668582032_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-06-01 03:54:01,328 Stage-1 map = 0%, reduce = 0%
2018-06-01 03:54:14,444 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.21 sec
MapReduce Total cumulative CPU time: 2 seconds 210 msec
Ended Job = job_1527668582032_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/marks_bucketed/.hive-staging_hive_2018-06-01_03-53-45_726_2788383119636056364-1/-ext-10000
Loading data to table default.marks_bucketed
Table default.marks_bucketed stats: [numFiles=1, numRows=14, totalSize=194, rawDataSize=180]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.21 sec HDFS Read: 3937 HDFS Write: 273 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 210 msec
OK
Time taken: 31.307 seconds
甚至,Hue文件浏览器只显示一个文件。屏幕截图已附上。 Hue File Browser screenshot for marks_bucketed table
答案 0 :(得分:1)
来自Hive文档
仅版本0.x和1.x
命令集hive.enforce.bucketing = true;允许正确的 Reducer的数量和按列自动的簇 根据表格选择。否则,你需要设置 减速器的数量与集合中的桶数相同 mapred.reduce.tasks = 256;并且有一个CLUSTER BY ...子句 选择。
因此您需要设置属性以强制进行分组或转到手动选项并运行查询
set mapred.reduce.tasks = 4;
select id,Name,mark from marks_temp cluster by id;