我目前在使用HIVE中的连接时遇到问题。连接两个大表会导致输出中有一个大文件和许多小文件。以下是我的详细情景
我的HQL是
INSERT OVERWRITE TABLE FINAL_TABLE
SELECT
a.x_address_full,
b1.lat,
b1.long,
a.y_address_full,
b2.lat,
b2.long,
a.z_address_full,
b3.lat,
b3.long,
FROM
TABLE_A a
LEFT JOIN TABLE_B b1 ON a.x_address = b1.address_key
LEFT JOIN TABLE_B b2 ON a.x_address = b2.address_key
LEFT JOIN TABLE_B b3 ON a.x_address = b3.address_key
在Filebrowser(Hue)中,我看到创建了大约50个零件文件。
-rwxr-xr-x 3 user user 103488475533 2015-09-08 20:18 FINAL_TABLE/000000_0
-rwxr-xr-x 3 user user 18887004 2015-09-08 16:43 FINAL_TABLE/000001_0
-rwxr-xr-x 3 user user 16806648 2015-09-08 16:43 FINAL_TABLE/000002_0
-rwxr-xr-x 3 user user 17759878 2015-09-08 16:43 FINAL_TABLE/000003_0
-rwxr-xr-x 3 user user 19229971 2015-09-08 16:43 FINAL_TABLE/000004_0
-rwxr-xr-x 3 user user 17361505 2015-09-08 16:43 FINAL_TABLE/000005_0
-rwxr-xr-x 3 user user 20935119 2015-09-08 16:43 FINAL_TABLE/000006_0
-rwxr-xr-x 3 user user 18525756 2015-09-08 16:43 FINAL_TABLE/000007_0
-rwxr-xr-x 3 user user 18155867 2015-09-08 16:43 FINAL_TABLE/000008_0
-rwxr-xr-x 3 user user 18388192 2015-09-08 16:43 FINAL_TABLE/000009_0
-rwxr-xr-x 3 user user 17352032 2015-09-08 16:43 FINAL_TABLE/000010_0
-rwxr-xr-x 3 user user 20586196 2015-09-08 16:43 FINAL_TABLE/000011_0
-rwxr-xr-x 3 user user 19026628 2015-09-08 16:43 FINAL_TABLE/000012_0
-rwxr-xr-x 3 user user 18492712 2015-09-08 16:43 FINAL_TABLE/000013_0
-rwxr-xr-x 3 user user 20525139 2015-09-08 16:43 FINAL_TABLE/000014_0
-rwxr-xr-x 3 user user 18767626 2015-09-08 16:43 FINAL_TABLE/000015_0
-rwxr-xr-x 3 user user 18759833 2015-09-08 16:43 FINAL_TABLE/000016_0
-rwxr-xr-x 3 user user 17625431 2015-09-08 16:43 FINAL_TABLE/000017_0
-rwxr-xr-x 3 user user 17589284 2015-09-08 16:43 FINAL_TABLE/000018_0
-rwxr-xr-x 3 user user 19635568 2015-09-08 16:43 FINAL_TABLE/000019_0
-rwxr-xr-x 3 user user 18782632 2015-09-08 16:43 FINAL_TABLE/000020_0
-rwxr-xr-x 3 user user 18468366 2015-09-08 16:43 FINAL_TABLE/000021_0
-rwxr-xr-x 3 user user 19348518 2015-09-08 16:43 FINAL_TABLE/000022_0
-rwxr-xr-x 3 user user 19132130 2015-09-08 16:43 FINAL_TABLE/000023_0
-rwxr-xr-x 3 user user 19661123 2015-09-08 16:43 FINAL_TABLE/000024_0
注意:所有表都是基于AvroSerDe的。
根据我到目前为止的分析,这似乎可能是由于x_address或y_address或z_address字段中的Skewness加载了多少一个值。
有什么想法吗?