Question

我有一个原始表，在YEAR，MONTH和DATE分区。例如：

echo "What you want to input numbers or string?"
read input

if [[ "$input" == "number" ]] || [[ "$input" == "Number" ]] || [[ "$input" == "NUMBER" ]] ;then
        echo "Number selected 1"
elif [[ "$input" == "String" ]] | [[ "$input" == "STRING" ]] || [[ "$input" == "string" ]] ;then
        echo "String selected"
        echo "Please give me the string to be XOR'ed"
        read convertthis
        echo  $convertthis | xxd -bi > bin-store
        $(sed -i -e 's/00000000://g' bin-store)
        $(sed -i -e 's/($convertthis).//g' bin-store)
else
        echo "Please re-run the script, input is wrong"
fi

我想创建一个新表，该表是此表的子集，但仍保留原始表的分区。

像

这样简单

col_1    col_2    col_3    YEAR    MONTH    DATE 
a        b        c        2017    03       25

但是，由于原始表太大，我必须通过分区遍历此查询。

我目前的解决方案是编写一个shell脚本，遍历所有分区并为每个分区运行单独的查询。

示例：

CREATE new_table AS 
SELECT * 
FROM original_table 
WHERE (conditions);

但这似乎非常圆润且效率低下。有没有办法在hive中直接执行此操作？

Answer 1

我最近不得不做几天这样的事情，但它要求你复制原始表的架构，或者至少使用CREATE TABLE LIKE

但是，最重要的是，你的insert语句需要指定分区

CREATE TABLE new_table (
    fields... 
)
PARTITIONED BY (year STRING, month STRING, day  STRING);

INSERT OVERWRITE TABLE new_table PARTITION(year, month, day) 
SELECT fields... , year, month, day -- partitions must be last
FROM original_table
WHERE 
year BETWEEN '2016' AND '2017';  -- add more, as necessary

您也可以使用CTAS功能，但使用分区表执行这些功能并不简单

来自分区表的Hive副本

1 个答案: