我有一个按日期和产品类型
分区的Hive表String datestr = "2017-01-12T00:00:00Z";
SimpleDateFormat dateFormat = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss Z", Locale.US);
Date convertedDateStart = new Date();
try {
convertedDateStart = dateFormat.parse(datestr);
camp_new.startdate = convertedDateStart;
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
我需要更新' S'中的product_type的所有值。到了' T' (衬衫到上衣)。由于我们的Hive版本不支持直接更新,因此无法直接更新。
此类发布的其他解决方案涉及创建新表并将product_id, sale_id, date, product_type
42342423, 43423, 2017-01-01, S
67867868, 23233, 2017-01-01, C
53453466, 63423, 2017-02-01, S
与insert overwrite
语句结合使用 - 例如
case
但如果要更新的列是分区,则无法工作。
还有其他方法可以解决这个问题吗?
答案 0 :(得分:0)
分区列“data”实际上是与目录相关的元数据
如果您已经有'T'文件夹,则将文件从当前日期+ product_type ='S'文件夹移动到相应的日期+ product_type ='T'文件夹。
如果您没有'T'文件夹,只需重命名'S'文件夹并更新分区列表即可。
演示
hive> select * from product;
OK
67867868 23233 2017-01-01 C
42342423 43423 2017-01-01 S
53453466 63423 2017-01-02 S
[training@localhost ~]$ hdfs dfs -ls -R /user/hive/warehouse/product
drwxrwxrwx - training hive 0 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=S
-rwxrwxrwx 1 training hive 15 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=S/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=S
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=S/000000_0
[training@localhost ~]$ hdfs dfs -mkdir /user/hive/warehouse/product/date=2017-01-01/product_type=T
[training@localhost ~]$ hdfs dfs -mkdir /user/hive/warehouse/product/date=2017-01-02/product_type=T
[training@localhost ~]$ hdfs dfs -mv /user/hive/warehouse/product/date=2017-01-01/product_type=S/000000_0 /user/hive/warehouse/product/date=2017-01-01/product_type=T/000000_0
[training@localhost ~]$ hdfs dfs -mv /user/hive/warehouse/product/date=2017-01-02/product_type=S/000000_0 /user/hive/warehouse/product/date=2017-01-02/product_type=T/000000_0
[training@localhost ~]$ hdfs dfs -ls -R /user/hive/warehouse/product
drwxrwxrwx - training hive 0 2017-01-10 13:41 /user/hive/warehouse/product/date=2017-01-01
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-01/product_type=S
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-01/product_type=T
-rwxrwxrwx 1 training hive 15 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=T/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:41 /user/hive/warehouse/product/date=2017-01-02
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-02/product_type=S
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-02/product_type=T
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=T/000000_0
hive> msck repair table product;
OK
Partitions not in metastore: product:date=2017-01-01/product_type=T product:date=2017-01-02/product_type=T
Repair: Added partition to metastore product:date=2017-01-01/product_type=T
Repair: Added partition to metastore product:date=2017-01-02/product_type=T
Time taken: 0.409 seconds, Fetched: 3 row(s)
hive> select * from product;
OK
67867868 23233 2017-01-01 C
42342423 43423 2017-01-01 T
53453466 63423 2017-01-02 T